Targeted sequencing of candidate genes of dyslipidemia in Punjabi Sikhs: Population-specific rare variants in GCKR promote ectopic fat deposition

Dyslipidemia is a well-established risk factor for cardiovascular diseases. Although, advances in genome-wide technologies have enabled the discovery of hundreds of genes associated with blood lipid phenotypes, most of the heritability remains unexplained. Here we performed targeted resequencing of 13 bona fide candidate genes of dyslipidemia to identify the underlying biological functions. We sequenced 940 Sikh subjects with extreme serum levels of hypertriglyceridemia (HTG) and 2,355 subjects were used for replication studies; all 3,295 participants were part of the Asian Indians Diabetic Heart Study. Gene-centric analysis revealed burden of variants for increasing HTG risk in GCKR (p = 2.1x10-5), LPL (p = 1.6x10-3) and MLXIPL (p = 1.6x10-2) genes. Of these, three missense and damaging variants within GCKR were further examined for functional consequences in vivo using a transgenic zebrafish model. All three mutations were South Asian population-specific and were largely absent in other multiethnic populations of Exome Aggregation Consortium. We built different transgenic models of human GCKR with and without mutations and analyzed the effects of dietary changes in vivo. Despite the short-term of feeding, profound phenotypic changes were apparent in hepatocyte histology and fat deposition associated with increased expression of GCKR in response to a high fat diet (HFD). Liver histology of the GCKRmut showed severe fatty metamorphosis which correlated with ~7 fold increase in the mRNA expression in the GCKRmut fish even in the absence of a high fat diet. These findings suggest that functionally disruptive GCKR variants not only increase the risk of HTG but may enhance ectopic lipid/fat storage defects in absence of obesity and HFD. To our knowledge, this is the first transgenic zebrafish model of a putative human disease gene built to accurately assess the influence of genetic changes and their phenotypic consequences in vivo.


Introduction
Dyslipidemia is a well-established risk factor for cardiovascular disease and a principal cause of mortality in individuals with type 2 diabetes (T2D). Circulating blood lipid phenotypes are heritable risk factors for the development of atherosclerosis and their measurements are used clinically to predict future coronary artery disease (CAD) risk and therapy for primary prevention [1,2]. Epidemiological studies suggest that elevated serum triglyceride (TG) concentration is a strong independent risk factor for CAD [1,3]. There is an inverse correlation between serum TG and serum high-density cholesterol (HDL-C) that is associated with increased risk of cardiovascular dysfunction, despite the level of low density cholesterol (LDL-C) being normal. This combination of lipid alterations is defined as atherogenic dyslipidemia, which is a significant risk factor for the development of CAD [4]. Lowering of LDL-C has been the major focus in CAD prevention following treatment with HMG-CoA reductase inhibitors (statins). However, the mortality rate of CAD remains elevated particularly in the patients with T2D and insulin resistance, and reasons for their discordant effects in diabetics remain unknown [5].
Family and twin studies have shown that TG and lipoprotein levels aggregate in families [6]. Relatives of individuals with hyperlipidemia/dyslipidemia will have a 2.5-to 7-fold increase in risk of death due to premature CAD compared to relatives of control individuals [7,8]. The principal lipid alterations observed in these patients include high TG and low HDL-C. Cincinnati Lipid Research Clinic Family Study showed that low HDL-C and high TG occur conjointly and are transmitted across generations as a "combined phenotype" or "conjoint trait" [9]. Genome-wide association studies (GWAS) and meta-analyses studies on multiethnic populations including Punjabi Sikhs have uncovered more than 200 genetic loci associated with circulating blood lipid phenotypes [10][11][12][13]. However, despite the high clinical heritability (50-80%) of many of the lipid traits [14], these and several other studies have only explained up to10% of heritability in these genes. To identify putative functional with larger effects, in this study, we have performed targeted sequencing of 13 bona fide candidate gene regions (~2.9 Mb) (S1 Table/

Study subjects of discovery (sequencing) cohort
Genomic DNA samples of individuals including HTG cases (TG>150 mg/dl) and healthy controls with TG (<100 mg/dl) were sequenced with custom Nimblegen probes designed for targeted resequencing of 13 confirmed candidate genes for diabetic dyslipidemia in Sikhs. Diagnosis of T2D was confirmed by scrutinizing medical records for symptoms, use of medication, and measuring fasting glucose levels following the guidelines of American Diabetes Association [18]. The diagnosis for normo-glycemic controls was based on a fasting glycemia <110 mg/dL or a 2-h glucose <140 mg/dL. CAD was assigned when there was a documented prior diagnosis of heart disease, electrocardiographic evidence of angina pain, coronary angiographic evidence of severe (>50%) stenosis, or echocardiographic evidence of myocardial infarction. HTG is broadly defined as fasting serum TG concentrations above the ninety-fifth percentile [19], and was classified as mild HTG (150-399 mg/dL), high HTG (400-875 mg/ dL), and severe HTG (>875 mg/dL). The non-HTG control participants were recruited from the same Punjabi Sikh community and from the same geographic location as the HTG participants. They were selected on the basis of a fasting glycemia <100.8 mg/dL or a 2h glucose <141.0 mg/dL. BMI was calculated as weight (kg)/[height (m) 2 ]. Education, socio-economic status, dietary, and physical activity data were recorded. Smoking information was collected regarding past smoking, current smoking status, and length of time, number of cigarettes smoked /day. The vast majority of Sikhs were non-smokers, details are described elsewhere [17]. The individuals on lipid lowering medication are excluded from this cohort. All participants provided a written informed consent for investigations.
The study was reviewed and approved by the University of Oklahoma Health Sciences Center's Institutional Review Board, as well as the Human Subject Protection Committees of Hero Dayanand Medical College and Heart Institute, Ludhiana and Guru Nanak Dev University, Amritsar in India. Metabolic estimations of fasting serum lipids [total cholesterol, LDL-C, HDL-C, and TG] were quantified by using standard enzymatic methods (Roche, Basel, Switzerland) as previously described [17].
In this study, we only included those individuals who self-reported having no South Indian admixture and exclusively belonged to the North Indian Punjabi Sikh community, who reported that all of their four grandparents were of North Indian origin, and spoke the Punjabi language. Excluded were individuals of South, East, or Central Indian origin; those of non-Sikh/non-Punjabi origin; those with rare forms of lipid disorders including very low serum TG (abetalipoprotenemia, homozygous hypobetalipoprotenemia, familial combined hypobetalipoproteinemia), or severe HTG (extremely high serum TG >1,000 mg/dL); those with familial chylomicronemia, hemochromatosis, or pancreatitis; those on lipid lowering medication; and those with excessive alcohol intake (>400 mL/day). About 50% of HTG patients had T2D and~9% had CAD. Clinical characteristics of discovery-(resequencing) and replication cohort are summarized in Table 1.

Targeted sequencing
Targeted sequencing was performed at the Northwest Genomics Center in the department of Genome Sciences at the University of Washington through the RS&G Service sponsored by the National Heart Lung Blood Institute of the National Institutes of Health.
format. The purified DNA was subjected to a series of shotgun library construction steps, including fragmentation through acoustic sonication (Covaris), end-polishing and A-tailing, ligation of sequencing adaptors, and PCR amplification with 8 bp barcodes for multiplexing. Libraries undergo capture using the Roche/Nimblegen SeqCap EZ custom designed probe. Prior to sequencing, the library concentration was determined by triplicate qPCR and molecular weight distributions verified on the Agilent Bioanalyzer. Barcoded libraries were pooled using liquid handling robotics prior to clustering (Illumina cBot) and loading. Massively parallel sequencing-by-synthesis with fluorescently labeled, reversibly terminating nucleotides was carried out on the HiSeq sequencer.

Read processing
Our sequencing pipeline is a combined suite of Illumina software and other "industry standard" software packages (i.e., Genome Analysis ToolKit [GATK], Picard, BWA, SAMTools, and in-house custom scripts) and consists of base calling, alignment, local realignment, duplicate removal, quality recalibration, data merging, variant detection, genotyping and annotation. The overall processing pipeline consists of the following elements: (1) base calls generated in real-time on the HiSeq2500 instrument (RTA 1.13.48.0) (2) demultiplexed, unaligned BAM files produced by Picard ExtractIlluminaBarcodes and IlluminaBasecallsTo-Sam and (3) BAM files aligned to a human reference using BWA (Burrows-Wheeler Aligner; v0.6.2). Read data from a flow cell lane is treated independently for alignment and QC purposes in instances where the merging of data from multiple lanes is required (e.g., for sample multiplexing). The samples were sequenced using paired-end~140 to 150bp reads and the insert sizes were at least 100 bp in length. Therefore, we expected to see~240 to 250bp on the Bioanalyzer. Read-pairs not mapping within ± 2 standard deviations of the average library size (~150 ± 15 bp for the targeted region) were removed. All aligned read data are subject to the following steps: (1) "duplicate removal" was performed, (i.e., the removal of reads with duplicate start positions; Picard MarkDuplicates; v1.70) (2) indel realignment was performed (GATK IndelRealigner; v1.6-11-g3b2fab9) resulting in improved base placement and lower false variant calls and (3) base qualities were recalibrated (GATK TableRecalibration; v1.6-11-g3b2fab9).

Sequence data analysis QC
All sequence data underwent a QC protocol before they were released to the annotation group for further processing. This included an assessment of: (1) total reads; (2) library complexitythe ratio of unique reads to total reads mapped to target. DNA libraries exhibiting low complexity are not cost-effective to finish; (3) capture efficiency-the ratio of reads mapped to human versus reads mapped to target; (4) coverage distribution-80% at 20X required for completion; (5) capture uniformity; (6) raw error rates; (7) Transition/Transversion ratio (Ti/ Tv)-typically~3 for known sites and~2.5 for novel sites; (8) distribution of known and novel variants relative to dbSNP-typically < 7% using dbSNP build 129 in samples of European ancestry [22]; (9) fingerprint concordance > 99%; (10) sample homozygosity and heterozygosity and (11) sample contamination validation. All QC metrics for both single-lane and merged data were reviewed by a sequence data analyst to identify data deviations from known or historical norms. Lanes/samples that failed QC were flagged in the system and could be re-queued for library prep (< 5% failure) or further sequencing (< 2% failure), depending upon the QC issue. Completion was defined as having > 80% of the target at >20X coverage.

Variant detection
Variant detection and genotyping were performed using the UnifiedGenotyper (UG) tool from GATK (v1.6-11-g3b2fab9). Variant data for each sample were formatted (variant call format [VCF]) as "raw" calls that contain individual genotype data for one or multiple samples and flagged using the filtration walker (GATK) to mark sites that were of lower quality/false positives [e.g., low quality scores (Q50), allelic imbalance (ABHet 0.75), long homopolymer runs (HRun> 3) and/or low quality by depth (QD < 5)].

Variant annotation
We used an automated pipeline for annotation of variants derived from targeted sequencing data, the SeattleSeq Annotation Server (http://gvs.gs.washington.edu/ SeattleSeqAnnotation/). These publically accessible server returns annotations including dbSNP rsID (or whether the coding variant is novel), gene names and accession numbers, predicted functional effect (e.g., splice-site, nonsynonymous, missense, etc.), protein positions and amino-acid changes, Poly-Phen predictions, conservation scores (e.g., PhastCons, GERP), ancestral allele, dbSNP allele frequencies, and known clinical associations. The annotation process has also been automated into our analysis pipeline to produce a standardized, formatted output (VCF-variant call format, described above).

Replication studies, population characteristics, and SNP genotyping
We replicated the association of three functional variants in additional 2355 individuals of Punjabi Sikh ancestry. These included 1000 individuals from Sikh families and the remaining 1355 were unrelated; all were part of the AIDHS/SDS described previously [17,21,23,24]. Recruitment and diagnostic details of the Sikh replication cohort are similar as described above for the discovery cohort. Clinical and demographical details of these cohorts are provided in Table 1. Genotyping for selected GCKR SNPs (rs774930016 (S105N), rs760427565 (R297Q), and rs755537970 (R553W)) ( Table 2) was performed using TaqMan pre-designed or TaqMan made-to-order SNP genotyping assays from Applied Biosystems Inc. (ABI, Foster City, USA) as described previously [25]. Genotyping reactions were performed on Quant Studio6 genetic analyzer using 2 uL of genomic DNA (10 ng/uL), following manufacturers' instructions. For quality control, 8-10% replicative controls and negative controls were used in each 384 well plate to match the concordance. Genotyping call rate was 96% or more in all the SNPs studied.

Functional studies using zebrafish (ZF) model
To test the phenotypic effects of this and other novel variants in vivo, we created transgenic ZF (Danio rerio) models of the glucokinase regulatory protein (GCKR) mut and GCKR wt using TAB-5 strain, a commonly used strain derived from two commonly used fish lines (Tubingen and AB). Heterozygous human carriers of this mutation exhibit HTG and high rates of T2D, so we examined whether GCKR mut induces features of this phenotype in ZF. To build our transgenic models, we employed the Tol2 system, which mediates highly-efficient transgenesis [26]. . The expression construct were generated using "LR reaction" (in which attL and attR sites recombine); and transformed into bacteria following the protocol described (Kwan et al, 2007) [26]. The GCKR full CDNA was purchased from AddGene (Watertown, MA, USA). Mutations were created using site-directed mutagenesis kit (New England Biolabs, Ipswich, MA, USA) as described [26]. We used a promoter construct that drives human GCKR mut expression only in hepatocytes, while simultaneously labeling those cells fluorescently. To achieve hepatocyte-specific expression in D. rerio, we used the D. rerio liver fatty acid binding protein (L-FABP) promoter (courteously provided by Dr. Schlegel, University of Utah) [27]. To label GCKR mut expressing D. rerio hepatocytes, we joined the cDNA for human GCKR mut to the enhanced red fluorescent protein (mCherryFP), separated by a 2A peptide linker [26,28]. The 2A linker is an auto-cleaving peptide, resulting in the GCKR mut and mCherryFP proteins being expressed in a 1:1 stoichiometric ratio. Expression of mCherryFP by the ZF liver confirms expression of GCKR mut hepatocytes (Fig 1A-1C). Transformation of TOP10 cells with an LR recombination reaction yielded two classes of colonies: clear and opaque. Clear colonies yield the correct recombination product and were selected following the protocol described previously [26]. After building GCKR mut transgenic ZF, we evaluated the in vivo metabolic consequences of these human GCKR mutations by feeding a high fat diet to 5 day old larvae of wildtype (WT) and transgenic ZF with and without GCKR mutations. Our protocol for building transgenic lines of zebrafish for studying post-GWAS quantitative trait loci (QTL) for diabetes and cardiovascular traits has been approved by the Institutional Animal Care and Use Committee (IACUC) and Institutional Bio-safety Committee (IBC) of the University of Oklahoma Health Sciences Center (Protocol # 01550-16-067-SSHITF).

Diet experiments
For all feeding studies, 5-day post-fertilization (dpf) homozygous humanized GCKR-mutant or −WT transgenic zebrafish larvae were studied. For comparison to WT, we also studied 5 dpf larvae of Tab5 fish (the parental line used to construct transgenic constructs). All WT and mutant fish were distributed in 3 liter tanks (20 fish per tank) and fed defined diets for 14 days. Animals were housed in the main aquarium of the ZF Animal Resource core facility of the University of Oklahoma and maintained on a 14-hour light, 10-hour dark cycle. Animals were anesthetized and killed by immersion in ice water [29]. For control studies, 5 dpf homozygous GCKR mutant (GCKR mut ) or homozygous WT (GCKR wt ) fish were reared on a conventional diet (commercial powder and newly-hatched Artemia salina nauplii), twice-daily for 14 days. The high fat diet (HFD) groups were fed with a special fish diet (with 24% fat, 43% protein, and 4% fiber from Purina Aqua Mix) thrice-daily for 14 days. Three larvae from each feeding group were euthanized and their livers were dissected using a dissection microscope and sent for transmission electron microscopy at the Oklahoma Medical Research Foundations imaging core facility.

Larvae tissue embedding and hematoxylin and eosin (H&E) staining
A Leica TP1020 tissue processor was used to process the tissue, following the manufacturer protocol. Briefly, tissue in 10% neutral formalin buffer (NBF) are moved into labelled tissue blocks. The tissue in the blocks are progressively dehydrated with increasing concentrations of ethanol, then in xylene and imbibed with paraffin liquid. Due to fragility of Zebrafish larvae, they were placed in biospecimen bags and 5 minutes in each step of processing was adapted.
The paraffin imbibed tissue is taken out and embedded according to orientation as needed using a 10X dissection microscope. The formalin-fixed paraffin-embedded tissues were sectioned at desired thickness (4 μm) and mounted on positively charged slides. The slides were dried overnight at room temperature and incubated at 60˚C for 45 minutes. The Hematoxylin and Eosin were purchased from Leica biosystems and staining was performed utilizing Leica ST5020 Automated Multistainer following the Hematoxylin-Eosin (HE) staining protocol at the SCC Tissue Pathology Shared Resource.

Transmission electron microscopy
Zebrafish larvae were extracted using the dissection microscope and were fixed with 4%

Bioinformatics and statistical analysis
Missense variants were designated as damaging using the in-silico predictions generated by tools like PolyPhen [30], SIFT [31], BONGO [32], LRT [33], Mutation Taster [34], and PolyPhen-2 [35]. The variants with score of four of six defined by these algorithms were considered potentially damaging. Data quality for SNP genotyping was checked by establishing reproducibility of control DNA samples. Departure from HWE of common variants in controls was tested using the Pearson chi-square test.

Gene-centric association analysis
For gene-centric analysis, we performed gene-centric burden tests to jointly analyze multiple non-synonymous or other likely functional variants including singleton variants by Combined Multivariate and Collapsing (CMC) method [36], to collapse rare variants in different MAF categories and evaluate the joint effect of common and rare variants using SVS, v 2.0 (Golden Helix, Bozeman, MT, USA). We also used the variance-component test within a randomeffects model including the sequence kernel association test (SKAT) [37], which tests for association by evaluating the distribution of genetic effects for a group of variants instead of aggregating variants.

Single SNP association analysis
The genotype and allele frequencies in T2D cases were compared to those in control subjects using the chi-square test.

Results
Of a total of 2,709 individuals studied, targeted sequencing was performed on 940 subjects and 1,769 subjects were used for the replication studies. All these participants were part of the AIDHS/SDS [15][16][17]. Of the 940 sequenced samples, 820 passed the stringent QC based on multiple parameters and were used for further analysis. S1 Table describes details of the lipid candidate gene regions selected for targeted resequencing. A summary of high-quality variants analyzed for their distribution and association with lipid-related traits, diabetes and other cardiometabolic risk factors is provided in S2 Table. Our results revealed accumulation of several unknown rare (<1%) and less common variants (<10%) that were not found in any of the existing variant databases. For instance, our results of GCKR sequencing in Sikhs revealed clustering of 13 rare mutations and many of these were predicted to be damaging/ deleterious based on the in-silico prediction methods (Fig 2A). Gene-centric analysis for studying the aggregate effects of clustered variants within each gene, revealed significant burden of in the GCKR (p = 2.1x10 -5 ) along with LPL (p = 1.6x10 -3 ) and MLXIPL (p = 1.6x10 -2 ) for increasing the risk for HTG (S1 Table).
The present study is further mainly focused on 3 population-specific rare variants identified in GCKR gene (Fig 2B). The first functional variant (S105N), located on the Sugar Isomerase domain -1 (SIS-1), was functionally disruptive, and absent in Caucasians (n = 33,356), Africans (n = 5,197), Hispanic/ Latinos (5,788), East Asians (n = 4,324) in a large Exome Aggregation Consortium (ExAC) of multiethnic populations (Table 2). Two additional rare functional variants (R297Q and R553W) were also confined to this Sikh population only and were with high HTG in most carriers (Fig 2A and 2B, Table 2). Two of these three disruptive missense variants (S105N near fructose binding site and R553W near GCK interaction domain) were highly conserved across species (Figs 2D and 3B-1), while the mutant allele of R297Q variant was also found in cow and sheep in addition to its predominant presence in South Asians (Fig 3A-1).
The three functionally tested damaging rare mutations in GCKR were at the fructose binding site and GCK binding site at or near the sugar isomerase (SIS-1-2) domains (Fig 2A). The disruptive allele at codon 105 is predicted to destabilize the folding of the fructose binding domain that results in the loss of hydrogen bond between Serine (105) and Glutamine (190) (Fig 2E). Interestingly, this variant was monomorphic in Europeans, East Asians, Africans and Latinos of the ExAC consortium and only 3 of 8250 South Asians from Pakistan were carriers (genotype frequency 0.00036) whereas 9 out of 3132 Sikhs were carriers of this variant (0.0029). This variant co-segregated between heterozygous carriers, HTG-and T2D phenotypes in one Sikh family. Of these over 83% of carriers in this family had HTG (ranging from 148mg/dl to 530 mg/dl) and 75% of carriers were diabetic. Similarly, two more rare functional variants (R297Q and R553W) were confined to this population only and were with high TG in most individuals (S3B and S3C Table).
We investigated single variant association of each rare variant with diabetes and quantitative risk phenotypes (e.g. fasting glucose, body mass index (BMI), total cholesterol, LDL-C, HDL-C and TG) in discovery and replication cohorts. None of these variants showed any significant association with diabetes, fasting glucose or lipid traits except TG. As shown in Table 3 carriers of S105N (rs774930016) variant had a significant increased levels serum TG (β 0.59 ± 0.17; p = 0.001) after adjusting for age, gender and BMI. This association remained significant even after including T2D and family relatedness in the model (β 0.59 ± 0.17; p = 4.97 x 10 −4 ) in replication cohort and in combined (discovery and replication) samples (β 0.55 ± 0.19; p = 0.004). Similar but marginally significant association of R553W (rs755537970) variant was observed in combined samples with triglycerides (β 0.51± 0.23; p = 0.028).  Fig 3A-3 The wild-type Arg297 forms a salt bridge with glutamic acid at position 63 glutamic acid at position 300, however, the new mutant residue is not in the correct position to make the same hydrogen bond as the original wild-type residue did. Fig 3B-1. Sequence alignment reveals a complete conservation of wild type 'C' allele at position of Arg553Trp of the GCKR gene across species. Fig 3B-2. Protein modeling of and Arg 553Trp mutations in the GCKR gene. The wild-type residue (blue) forms a hydrogen bond with glutamic acid at position 536, and aspartic acid at position 550. The size difference between wild-type and mutant residue makes that the new residue is not in the correct position to make the same hydrogen bond as the original wild-type residue. The wild-type residue was positively charged, the mutant residue is neutral. The mutant residue is more hydrophobic than the wild-type residue. https://doi.org/10.1371/journal.pone.0211661.g003 Targeted sequencing of candidate genes of dyslipidemia in Sikhs However, no significant association was observed in R297Q (rs760427565) with TG (Table 3, Fig 4).
Based on significant association of these variants with HTG, we next evaluated the functional consequences of three South Asian population-specific variants by designing a humanized GCKR ZF model. The H&E images of liver of TAB-5, transgenic GCKR wt , and GCKR mut groups with normal diet and HFD are shown in Fig 5A-5C and S3A-S3C Fig. The fat disposition in liver hepatocytes of TAB-5 larvae was increased 3-4-fold in response to HFD. A similar increase in response to HFD was noticed in transgenic fish with wild type GCKR. However, in mutant transgenic fish exhibited a 3-fold increase in ectopic fat in hepatocytes with normal diet, with 80% hepatocytes having fat deposition, while transgenic mutants on HFD had hepatocytes loaded with fat showing a marked degeneration of hepatocyte nuclei with possible steatosis (Figs 5C and 6C). In response to HFD, mRNA expression of GCKR but increased about two folds in normal TAB-5 compared to normal diet. On the other hand, there was 7-fold increase in GCKR mut larvae even in the absence of HFD; whereas, the GCKR mut mRNA levels were restored to normal when fed on HFD (Fig 7).

Discussion
In this investigation, we have attempted to identify functional variants by resequencing 13 known candidate genes of dyslipidemia using an endogamous population of Punjabi Sikhs known to have high risk for cardiovascular diseases [11,12,[38][39][40][41]. Despite considerable success of GWAS, whole-genome, and exome sequencing, including studies from our group [15, [42][43][44], the genetic mechanisms that predispose people to metabolic and cardiovascular disease risk factors remain poorly understood. Of these 13 selected loci with prior evidence of association with three major lipids (HDL cholesterol, LDL cholesterol, and TG) in European populations [10,12,45,46], variants in ANGPTL3, GCKR, MLXIPL, LPL, TRIB1 and APOE genes have been shown to be associated with lipid phenotypes in South Asians [12]. Fine mapping of~195 kb region encompassing Chr11q23.3 [APO-A1-C3-A4-A5, ZNF259, and BUD13] by targeted genotyping revealed a strong association of this region with HTG (rs964184; p = 1.6x10 -39 ) in Punjabi Sikhs and South Asians) [11]. Here we have intended to capture putatively functional rare and less common variants from coding, non-coding, and intergenic regions including variants influencing gene regulation and expression within and around Targeted sequencing of candidate genes of dyslipidemia in Sikhs these known candidate genes. The degree of clinical heterogeneity existing in the CAD or cardiometabolic phenotypes imposes serious limitations in our ability to effectively measure genetic risk, environmental exposure, and their interactions. Additionally, most post-GWA studies on candidate gene sequencing have predominantly been focused on European populations which provide limited information on the usefulness of variants in populations of non-European ancestry. Moreover, the post-GWAS exome arrays could capture the majority of low-frequency variants in European populations only when the sample size exceeded >300,000 [47]. However, such studies in other disparate populations are sparse. The current investigation in family and population based sample from the AIDHS/SDS is an effort to identify missing heritability associated with GWAS-driven loci of dyslipidemia, specifically the HTG by using candidate gene resequencing.

Fig 4. Serum Triglyceride mean distribution among carriers and non-carriers of missense variants.
Bar graph shows the differences in the distribution of mean serum triglycerides among carriers and non-carriers of missense mutations (S105N, R297Q, and R553W) represented by rs774930016, rs760427565, rs755537970 variants, respectively. Data are shown in mean and standard deviation of means. P values are derived from the general mixed linear models used to test the impact of genetic variants on transformed continuous trait (TG) using the variance-component test adjusted for the random-effects of relatedness and fixed effects of age, gender, BMI and type 2 diabetes. Only association of S105N would remain significant after applying Bonferroni correction. https://doi.org/10.1371/journal.pone.0211661.g004 Targeted sequencing of candidate genes of dyslipidemia in Sikhs As expected, the AIDHS/SDS, being an endogamous and relatively homogenous population, was enriched with rare and less common variants. Enrichment of functional variants in cases with HTG along with our focus on individuals with extreme trait values (TG) increased our power to discover pathogenic variants and aided the discovery of multiple rare and   Fig 6C. Neutral fat appears as small round vesicles with no structure. There is abnormal fat accumulation in the transgenic mutant fish with HFD ( Fig  6C). Black arrows point to round vesicle like structures with empty content which are fat droplets. The red arrows point to two huge abnormal structures which appear like fusion of large neutral fat vesicles in the transgenic mutant with HFD. However, the transgenic mutant with normal diet shows abnormal structures with possible accumulation of phospholipids (red arrows) in hepatic cells suggesting abnormal metabolic function. https://doi.org/10.1371/journal.pone.0211661.g006 Targeted sequencing of candidate genes of dyslipidemia in Sikhs common known and novel variants in splice regions, 5'UTRs, 3'UTRs, intronic, and missense (loss-of-function) variants. Moreover, this ethnic subgroup of Sikhs of North India were enrolled from one single geographic location with shared environmental and cultural traits, which further has reduced the environmental and cultural heterogeneity.
Gene-centric analysis of the identified variants revealed a significant burden of variants for increasing HTG risk in GCKR (p = 2.1x10 -5 ), LPL (p = 1.6x10 -3 ) and MLXIPL (p = 1.6x10 -2 ). The GCKR is glucokinase regulatory protein that inhibits glucokinase (GCK) by forming a complex with the enzyme in the liver, which plays a role in glucose homeostasis [48]. Fructose 6-phosphate (F6P) enhance while fructose 1-phosphate (F1P) reduce the GCKR-mediated inhibition of GCK [49]. Lipoprotein lipase (LPL) has long been recognized as an enzyme that hydrolysis of triglyceride-rich lipoproteins to release free fatty acids for energy metabolism [50]. The MLX1PL encodes a basic helix-loop-helix leucine zipper transcription factor of the Myc/Max/Mad superfamily. This protein forms a heterodimeric complex and binds and activates carbohydrate response element (ChoRE) motifs in the promoters of triglyceride synthesis genes [51]. Common variants within and around these genes are associated with increased levels of TG and CAD in multiethnic GWAS and metanalysis studies including Sikhs [10,39]. To test their phenotypic effects and to evaluate metabolic consequences in vivo, we focused on three putative variants identified in the human GCKR gene by building four transgenic humanized ZF models. These variants were located near the fructose binding site or GCK binding sites at the sugar isomerase domains of the human GCKR gene. Evidently, the human GCKR is about 3 times larger than the ZF GCKR and it only shows 41% similarity with humans (S2-B Fig). Due to the absence of 386 amino acids in the ZF GCKR gene, our three functional variants fall outside the ZF GCKR protein.
Despite this dissimilarity, ZF are a well-suited model for studies involving human energy metabolism because the pathways of lipid storage and transport are conserved across species Targeted sequencing of candidate genes of dyslipidemia in Sikhs [52]. Further, the dietary studies performed in a ZF model for developing atherosclerosis and hepatic steatosis in response to a high-cholesterol diet revealed the potential strength of this model for analyzing diet-induced phenotypes [53]. In this study, the ZF larvae exposed to HFD and normal diet revealed a 2 to 3 fold increase in the fat accumulation in hepatocytes in response to HFD both in TAB-5, and control transgenic fish (with normal human GCKR) with no apoptosis. However, the observed 4+fold increase in liver fat accumulation with at least 1 apoptotic cell every hundred hepatic cells, even in the absence of a HFD in transgenic GCKR mut , suggests the impaired function of GCKR due to mutations, which may impair GCKR to act promptly in response to the increased concentration of fructose 6-phosphate. This consequently would lead to uninterrupted release of GCK in the liver, resulting in increased uptake of glucose and eventually leading to de novo lipogenesis [54]. Alternatively, studies suggest that GCKR stabilizes and protects GCK from degradation. Thus, the increase expression with impaired function of GCKR may result in reduced GCK activity or function, which would give rise to impaired glucose tolerance and hepatic fat accumulation [55]. Evidently, from these studies it appears that the GCKR could be a thrifty gene and the functionally disrupted variants in the GCKR in Punjabi Sikhs may enhance ectopic fat storage defects even in the absence of HFD, as revealed in the transgenic ZF. Not only did most hepatic cells contain vacuoles of fat, but the structure of hepatocytes was disorganized due to fatty metamorphosis and severe disorganization of the hepatic structure and hepatocyte nuclei with possible steatosis in the absence of HFD in GCKR mut compared to GCKR wt or WT (TAB-5) Fig 5C. Also, there was abnormal accumulation of lipids (phospholipids) other than the neutral fat in mutant transgenic in the absence of HFD (Fig 6C). Phospholipid accumulation in hepatic cells are often seen in people with metabolic disorders.
A previously known functional variant (Proline to Leucine) at position 446 of the GCKR (rs1260326) was identified as a novel locus for TG metabolism in Caucasian GWAS, and has since been robustly replicated in multiple genome-wide studies of plasma TG [39]. The same variant has also been shown to influence fatty liver disease in children and adults [56]. Of note, the minor risk allele frequency differed significantly between Sikhs (0.27) and other South Asians [e.g. Gujarati Indians (GIH 0.19) and South Asians from Pakistan (ExAC) 0.20], also European Caucasians (0.36). Although our study confirmed the association of this SNP rs1260326 with TG in Sikhs (β 0.09, ± 0.02, p = 3.42x10 -5 ), the genetic variance explained by this variant was <2% in Sikhs. Whereas, the rs774930016 (representing codon 105) explained 38% of HTG. Both rs760427565 for codon 297 and rs75537970 for codon 553 explained~25% of HTG genetic variance among carriers. The association of codon 105 with TG remained statistically significant even after controlling for BMI, age, gender and T2D (Table 3), suggesting its functional role in increasing HTG risk independent of T2D. Notably, these variants were only restricted to South Asian populations; indeed the rs755537970 (of codon 553) appears to be confined to Punjabi Sikhs (Table 2).
We and others have shown that Asian Indian populations may possess a different physiology of obesity [17, [57][58][59]. South Asians generally have a non-obese BMI with lower muscle mass and increased visceral fat, which is also associated with their high rates of T2D in the absence of obesity [58,[60][61][62][63][64][65]. Even results of computed tomography (CT) scans show that Asian Indians have 30% more body fat than age-and BMI-matched African American men, and 21% more body fat than Swedish men [66][67][68]. Thus, Asian Indians are metabolically obese despite a nonobese BMI. The uneven distribution of fat in insulin sensitive organs like the liver or pancreas increases the risk for development of insulin resistance, T2D, and non-alcoholic fatty liver disease (NAFLD) [69], which are common in Indians. Based on the results of this study, carriers of these evolutionarily conserved variants (specifically S105N and R553W) will have a high risk of ectopic fat deposition and increased risk for NAFLD in the absence of overt obesity.
Overall, successful humanized transgenic GCKR mut -expressing D. rerio has provided a platform for our ongoing studies to define the precise mechanisms of metabolic derangement perhaps by modulating the GCKR-GCK complex leading to HTG and T2D in humans. Limitations of our study include the lack of data on GCK mRNA, GCKR/GCK protein quantification and GCK activity, which may provide more insight on the putative effects of functional mutations on regulation of GCKR and clarify the effects of exposure of HFD on the humanized ZF. Our results agree with the earlier reports of targeted improvement of GCK activity by liverspecific GCKR inhibition which may lower the risk of HTG [49] In summary, our study for the first time reports a causal role of rare disruptive variants in GCKR for increasing serum TG levels independent of T2D in Punjabi Sikhs. These results may also partly support the "non-obese-metabolic obese phenotype" of Asian Indians linked to increased risk for developing cardiovascular diseases.  Table. A-C. (S3 Table-A). Demographic and clinical traits of carriers of GCKR S105N (rs774930016) variant in subjects from the AIDHS (S3 Table-