Genetic Polymorphisms of the Human PNPLA3 Gene Are Strongly Associated with Severity of Non-Alcoholic Fatty Liver Disease in Japanese

Background Nonalcoholic fatty liver disease (NAFLD) includes a broad range of liver pathologies from simple steatosis to cirrhosis and fibrosis, in which a subtype accompanying hepatocyte degeneration and fibrosis is classified as nonalcoholic steatohepatitis (NASH). NASH accounts for approximately 10–30% of NAFLD and causes a higher frequency of liver-related death, and its progression of NASH has been considered to be complex involving multiple genetic factors interacting with the environment and lifestyle. Principal Findings To identify genetic factors related to NAFLD in the Japanese, we performed a genome-wide association study recruiting 529 histologically diagnosed NAFLD patients and 932 population controls. A significant association was observed for a cluster of SNPs in PNPLA3 on chromosome 22q13 with the strongest p-value of 1.4×10−10 (OR = 1.66, 95%CI: 1.43–1.94) for rs738409. Rs738409 also showed the strongest association (p = 3.6×10−6) with the histological classifications proposed by Matteoni and colleagues based on the degree of inflammation, ballooning degeneration, fibrosis and Mallory-Denk body. In addition, there were marked differences in rs738409 genotype distributions between type4 subgroup corresponding to NASH and the other three subgroups (p = 4.8×10−6, OR = 1.96, 95%CI: 1.47–2.62). Moreover, a subgroup analysis of NAFLD patients against controls showed a significant association of rs738409 with type4 (p = 1.7×10−16, OR = 2.18, 95%CI: 1.81–2.63) whereas no association was obtained for type1 to type3 (p = 0.41). Rs738409 also showed strong associations with three clinical traits related to the prognosis of NAFLD, namely, levels of hyaluronic acid (p = 4.6×10−4), HbA1c (p = 0.0011) and iron deposition in the liver (p = 5.6×10−4). Conclusions With these results we clearly demonstrated that Matteoni type4 NAFLD is both a genetically and clinically different subset from the other spectrums of the disease and that the PNPLA3 gene is strongly associated with the progression of NASH in Japanese population.


Introduction
Nonalcoholic fatty liver disease (NAFLD) includes a broad range of pathologies from fatty liver (simple steatosis), steatonecrosis, and steatohepatitis to cirrhosis [1][2][3]. NAFLD often accompanies other lifestyle-related pathologies of metabolic syndrome such as diabetes mellitus, hypertension and dyslipidemia, and the number of NAFLD patients is increasing worldwide along with the escalation in the incidence of metabolic syndrome [4]. Prevalence of NAFLD is considered as approximately 8% in Japanese and 6-35% in Europeans [4,5]. The majority of NAFLD shows simple steatosis with a good prognosis, but approximately 10-30% of NAFLD histologically diagnosed as nonalcoholic steatohepatitis (NASH) shows hepatocyte degeneration (ballooning hepatocyte), necrosis, inflammation and fibrosis, with a higher frequency of liver-related death both in Japanese and European populations [6,7]. Insulin resistance and oxidative stress are considered to be key players in the progression of NASH [8,9]. However, the progression of NASH has been considered to be complex involving multiple genetic factors interacting with the environment and lifestyle, because only a portion of NAFLD patients develops NASH.
The first Genome-wide association (GWA) study searching for such genetic factors identified the PNPLA3 gene as a major genetic determinant for the predisposition to NAFLD in Hispanic, African American and European American populations according to liver fat contents [10], which was subsequently confirmed in Europeans and Asians according to liver biopsy. Association of PNPLA3 with not only fatty liver and TG content, but also inflammation and fibrosis were shown in the subsequent studies, so PNPLA3 may be widely associated with the development of NAFLD [11][12][13]. More recently, another GWA study reported the association of four additional genes with NAFLD in Europeans [14]. Also, a candidate gene-based approach revealed the association between NAFLD and the apolipoprotein C3 gene in Indians [15]. However, the precise role of such genes in the development of NASH still remains to be elucidated. In addition, no GWA study has been reported for Asian populations to date although the genetic components and their relative contribution may be different between ethnicities.
The Japan NASH Study Group was founded in 2008 aiming at the identification of genetic determinants predisposing to NASH in the Japanese population. Here we report the first GWA study of NAFLD in the Japanese using DNA samples of patients with liver histology-based diagnoses recruited through this multi-institutional research network.

Genome-wide Association Analysis of NAFLD in Japanese
We conducted a GWA study using DNA samples of 543 patients with NAFLD and 942 controls. After quality controls of genotyping results (see materials and methods for details), a total of 529 patients consisting of four NAFLD subgroups according to Matteoni's classification [2] (type1; 100, type2; 73, type3; 29, type4; 327) and 932 controls were subjected to statistical analyses (Table 1). This index pathologically classifies NAFLD according to the degree of inflammation, hepatocyte degeneration, and the existence of fibrosis and Mallory-Denk body in the liver. Genome scan results of 932 DNA samples collected for other genetic studies were used as general Japanese population controls [16]. After standard quality control procedure as described in materials and methods, genotype distributions of 484,751 autosomal SNP markers were compared between the NAFLD cases and control subjects by exact trend test. A slight inflation of p-values was observed by genomic control method (l = 1.04) ( Figure S1).
We identified six SNP markers located at chromosome 22q13 showing genome-wide significance (p,1.04610 27 ) ( Figure 1). Among them, four SNPs, namely, rs2896019, rs926633, rs2076211 and rs1010023, located in the PNPLA3 gene and in strong linkage disequilibrium (LD) (r 2 .0.93), returned p-values smaller than 1610 29 (p = 1.5610 210 , 7.5610 210 , 1.4610 29 and 1.5610 29 , respectively) ( Table 2). Rs738407 and rs3810662 also located in PNPLA3 showed significant but weaker associations (p = 1.0610 27 and 1.0610 27 , respectively) than the above four SNP markers. Rs738491, rs2073082, rs3761472, rs2235776, rs2143571 and rs6006473 were in the neighboring SAMM50 gene which is outside of the linkage disequilibrium (LD) block where the top SNP markers were distributed ( Figure 2). These markers were in moderate LD with each other (r 2 .0.42) and showed p-values between 3.9610 26 and 6.4610 27 but did not reach genome-wide significance (Table S1). Rs738409, the SNP which showed the strongest association with NAFLD in the first GWA study [10], was not included in the SNP array used in our study. This SNP was therefore genotyped using Taqman technology in the same case and control samples that were used for genome scan. Rs738409 showed the strongest association with the disease (p = 1.4610 210 , OR = 1.66, 95%CI: 1.43-1.94) among all the SNP markers examined in this study. The association remained after the correction for population stratification with EIGEN-STRAT [17] (p = 2.3610 211 ). Although a peak consisting of a cluster of SNPs was observed at the HLA locus on chromosome 6 (minimal p-value of 4.10610 27 for rs9262639 located at the 39 of C6orf15 gene), the association disappeared when EIGENSTRAT was applied (p.1.6610 23 ). We consider this as a result of population stratification between the cases and controls.

Impact of PNPLA3 Polymorphisms to the Pathogenicity of NAFLD
We next examined whether or not the seven SNPs in the PNPLA3 gene were associated with the pathogenic status of NAFLD. The genotype distributions of these SNPs were compared by Jonckheere-Terpstra test among the four subgroups of NAFLD patients categorized by Matteoni's classification (type1 to type4). There was a significant increase in the frequency of the risk allele from Matteoni type1 to type4 for all of the seven SNPs (p-values ranging from 3.6610 26 to 0.0017) ( Table 2). Among them, rs738409 again showed the strongest association (p = 3.6610 26 ) as seen in the simple case/control analysis. On the other hand, there was no significant association between control and Matteoni type1 (p = 0.76).
In order to clarify how rs738409 influences the pathogenicity of NAFLD, we performed pairwise comparisons of genotype distributions in the four subgroups of NAFLD patients. There were marked differences in genotype distributions between type4 subgroup and the other three subgroups by multivariable logistic regression adjusted for age, sex and body mass index (BMI) (p = 2.0610 25 , OR = 2.18, 95%CI: 1.52-3.18 between type1 and type4; p = 1.4610 23 , OR = 1.81, 95%CI: 1.26-2.62 between type2 and type4; p = 0.027, OR = 1.85, 95%CI: 1.07-3.19 between type3 and type4) ( Figure 3). On the other hand, no significant associations were obtained for type1 to type3 in any combinations. When we performed the same analysis between type4 and the pooled genotypes of type1 to type3, we again obtained a significant difference (p = 4.8610 26 , OR = 1.96, 95%CI: 1.47-2.62).
We further examined the specific association of rs738409 with type4 subgroup by using the case/control association results of the initial genome scan. 529 NAFLD patients were divided into 202 patients with type1 to type3 and 327 patients with type4, and genotype distributions of rs738409 in each subgroup were compared with those of 932 control subjects. Exact trend test returned an extremely strong association of rs738409 with type4 subgroup (p = 1.7610 216 , OR = 2.18, 95%CI: 1.81-2.63) whereas no association was obtained for type1 to type3 subgroups (p = 0.41).

Association of rs738409 Genotypes with Clinical Traits
The quantitative effects of rs738409 genotypes to clinical traits were examined by multivariable regression adjusted for age, sex and BMI (statistical calculation 1, Table 3). Five categorical ordinals, namely, anti-nuclear antibody (ANA), Brunt grade, Brunt stage, fat deposition and iron deposition, were also tested by an ordinal logistic regression analysis. Potential associations (p,0.05) were obtained for 11 traits, namely, aspartate transaminase (AST), alanine aminotransferase (ALT), type IV collagen 7S, hyaluronic acid, hemoglobin A1c (HbA1c), fasting immunoreactive insulin (IRI), fasting plasma glucose (FPG), platelet count (PLT), Brunt grade, fat deposition and iron deposition (Table 3). When the results were further adjusted for Matteoni type (statistical calculation 2), AST, hyaluronic acid, HbA1c, FPG, PLT, Brunt grade and iron deposition showed p-values smaller than 0.05. The level of serum triglyceride was not significant in the initial analysis, but became significant after being adjusted for Matteoni's type (p = 0.013). Among them, only three traits, namely, hyaluronic acid, HbA1c and iron deposition, remained significant (p,0.0021) after Bonferroni's correction for multiple testing ( Table 3).

Associations of Previously Reported SNPs with NAFLD
Previous genetic studies identified four chromosomal loci, namely, LYPLAL1 at 1q41, GCKR at 2p23, NCAN at 19p12 and PPP1R3B at 8p23.1, associated with NAFLD in populations of European descent [14]. We examined whether or not the associations were reproduced in the Japanese population by extracting genotype information of SNP markers corresponding to these four loci. As shown in Table 4, the association of rs780094 in GCKR with NAFLD was at the border of significance (p = 0.011, OR = 0.82, 95%CI: 0.70-0.91) in the case/control analysis. However, the association was lost when examined between rs780094 genotypes and Matteoni types. There were no associations of rs2228603 in NCAN and rs12137855 in LYPLAL1 with either NAFLD or Matteoni types. Rs4240624 in PPP1R3B was not in the SNP array used for this study, and this marker was not polymorphic or at a very low frequency in the Japanese (0 in 90

Discussion
NASH is a type of hepatic steatosis in NAFLD with poor prognosis accompanying liver fibrosis, and subsequent liver cirrhosis and hepatocellular carcinoma [18]. Despite the extensive biochemical and histological investigation of NAFLD, whether or not NASH forms a distinct disease entity in NAFLD still remains unclear. The principle aim of this study was to identify the genetic factors related to the pathogenic status of NAFLD by collecting DNA samples of Japanese NAFLD patients with critically diagnosed disease status by liver biopsy. To our knowledge, this is the first GWA study of NAFLD using patients with known histology-based Matteoni type. In the initial association study using pooled genotyping results of all the cases, we found a significant association of the PNPLA3 gene at chromosome 22q13.31 with NAFLD in the Japanese. Rs738409 which showed the strongest association with NAFLD in the GWA study of Caucasians was also genotyped and its strongest association with NAFLD was confirmed. These results were in agreement with the former GWA analyses in populations of European descent and in Hispanics, giving strong evidence of the involvement of PNPLA3 in NAFLD beyond ethnicities. Rs738409 is located in exon3 of the PNPLA3 gene which is expressed in the liver and adipose tissue. This SNP introduces an amino acid substitution from isoleucine to methionine (I148M), and biological studies demonstrated that its risk allele (G) abolishes the triglyceride hydrolysis activity of PNPLA3 [19]. These observations strongly suggest rs738409 to be a causative genetic variation for NAFLD. However, future genomic analyses by fine mapping or extensive sequencing may identify additional genetic determinants within the PNPLA3 locus.
In the current study we did not find other genetic loci showing genome-wide significance (p,1.0610 27 ). However, two additional chromosomal loci with p-values being smaller than 1610 25 were identified on chromosome 1p (rs11206226) and chromosome 4p (rs1390096) neither of which has been reported as being associated with NAFLD in Caucasians (Table S1). Statistical calculation by taking their allele frequencies and effect sizes into account showed that approximately three times as many case and control samples are required to obtain sufficient statistical power (.0.8) for genome wide significance. Hence, further confirmation is required using a larger collection of patients and controls although they may be potential candidates of low-penetrance genes for susceptibility to NAFLD in Japanese.
Subsequent analyses through comparison of genotype distribution among four subgroups of NAFLD (type1 to type4) categorized by Matteoni's classification revealed that the seven NAFLDassociated SNPs in the PNPLA3 gene were also significantly associated with the pathogenic status of NAFLD. There were also marked differences in genotype distribution of rs738409 between type4 subgroup and the other three groups (p = 4.8610 26 , OR = 1.96, 95%CI: 1.47-2.62 between type4 and pooled genotypes of type1 to type3). Moreover, a case/control analysis of rs738409 between Matteoni type4 and controls returned a surprisingly strong association (p = 1.7610 216 ) which was much stronger than the initial analysis using all NAFLD cases (p = 1.4610 210 ), whereas the analysis using Matteoni type1 to type3 as cases didn't show significance (p = 0.41). There were differences in the score of HOMA-IR and hs-CRP, indicators of insulin resistance and inflammation, respectively, between Matteoni type1 to type3 and type4 subgroups (Table 1). Our results provide compelling evidence that NASH corresponding to Matteoni type4 is both a clinically and genetically different disease subset from other spectrums of NAFLD. Previous studies showed association between PNPLA3 and fatty liver, inflammation, fibrosis grade and NASH [13]. In our result, strong association between rs738409 and fatty liver was not observed by comparing control and Matteoni type1. In addition, strong association between rs738409 and lobular inflammation was not observed by comparing Matteoni Type1 and Type2. In contrast, a strong association between rs738409 and NASH was observed. Although we could not observe the strong association between rs738409 and fibrosis stage, strong association between rs738409 and Hyaluronic acid suggests that an association exists between PNPLA3 and fibrosis.
We have also undertaken association analyses of rs738409 and clinical traits in the patients. The multivariable regression analysis adjusted for age, sex, BMI and Matteoni type followed by the correction for multiple testing revealed hyaluronic acid and HbA1c as being significantly associated with rs738409. Hyaluronic acid is one of the principle components of the extracellular matrix and its involvement in fibrosis has been previously suggested [20]. This may indicate another possible functional involvement of PNPLA3 in the progression of liver fibrosis by influencing the circulating hyaluronic acid levels. A weak association of rs738409 and HbA1c levels was observed in our study population. However, there are no reports to date indicating such an association, and confirmation with different sample sets is needed for definitive conclusion. Also, the association between rs738409 and iron deposition was demonstrated by an ordinal logistic regression analysis. Since the association still remained after the results were adjusted with Matteoni type, rs738409 may play a functional role in the oxidative stress through iron absorption in the liver.
Recently, a genetic analysis of Japanese NAFLD patients was reported demonstrating a significant association in the increase of AST, ALT, ferritin levels and fibrosis stage (Brunt stage) and in the decrease of serum triglyceride with the risk allele (G) of rs738409 [12]. In our study, the association of rs738409 with AST (p = 1.2610 24 ) and ALT (p = 0.0016) was reproduced and that of AST still remained after the results were adjusted for Matteoni type (p = 0.038). No association was observed for ferritin level. Brunt stage was available for Matteoni type4 patients only in our study. Although the odds ratio was slightly high (OR = 1.28, 95%CI: 0.95-1.72), it was not possible to examine the association. In addition, the inverse association of the risk allele of rs738409 with decrease of serum triglyceride was confirmed in our study (p = 0.013 after being adjusted for Matteoni type). For all of these biomarkers, however, the significance was lost after the correction for multiple testing.
A replication analysis of other genetic loci that had been reported for their association with NAFLD in East coast white Americans [14] was performed in our sample collection. We confirmed the association of rs780094 in GCKR with NAFLD in a case/control analysis but at a much weaker level (p = 0.011, OR = 0.82, 95%CI: 0.70-0.95) than that shown for the populations of European-descent. No associations were found for LYPLAL1 and NCAN loci in our study. There are several reasons to explain such differences, such as the insufficient statistical power with a limited number of study subjects in our study due to the difficulty in the collection of a larger number of histologically diagnosed NAFLD patients. The difference in genetic background between the Japanese and Europeans is also conceivable. Indeed, the risk allele frequency of rs12137855 in LYPLAL1 was 0.944 in our control subjects but approximately 0.79 in the European populations [14]. Similarly, there was a difference in the risk allele frequency of rs2228603 in NCAN (0.049 in Japanese and 0.08 in Europeans). Rs4240624 in PPP1R3B was not polymorphic in the Japanese while its risk allele frequency was 0.91 in Europeans.

Ethics Statement
In compliance with the Declaration of Helsinki, ethical approval for this study was given by the respective Institutional Review Board and subject written informed consent were obtained for all subjects (Ethical committee of Nara

Study Population
A total of 543 patients histologically diagnosed for NAFLD in 2007-2009 were recruited through the Japan study of Nonalcoholic Fatty Liver Disease. Biopsy specimens were stained with H&E and Masson's trichrome for morphological review and assessment of fibrosis. Perl's Prussian blue was performed to evaluate iron load. Biopsy specimens were reviewed by a hepatopathologist (T.O). NAFLD patients were classified into four categories by liver histology according to the classification by Matteoni et al [2] as follows; type1: fatty liver alone, type2: fat accumulation and lobular inflammation, type3: fat accumulation and ballooning degeneration, type4: fat accumulation, ballooning degeneration, and either Mallory-Denk body or fibrosis. With these criteria, the 543 patients were classified as type1; 102, type2; 75, type3; 31 and type4; 335. The histological grade and fibrosis stage were also evaluated by the classification of Brunt et al [21] for advanced NAFLD cases (type3 and type4) as follows; grade 1: steatosis involving up to 66% of biopsy, occasional ballooned zone 3 hepatocytes and absence or mild portal chronic inflammation, grade2: steatosis, ballooning hepatocytes mild to moderate chronic inflammation, grade3: panacinar steatosis, ballooning and disarray obvious and mild or portal mild to moderate inflammation, stage1: perivenular and/or perisinusoidal fibrosis in zone3, stage2: combined pericellular portal fibrosis, stage3: septal/bridging fibrosis, stage4: cirrhosis. The degree of fat deposition was evaluated by amount of fat droplets as observed under the microscope as follows; 0: ,5%, 1: 5-,10%, 2: 10-,34%, 3: 34-,67%, 4: .67%. The degree of iron deposition was categorized by the presence of granules of free iron observed under the microscope as follows; 0: absence by x400, 1: easily identifiable by x400 and rarely identifiable by x250, 2: identifiable by x100, 3: identifiable by x25, 4: identifiable at lower than x25.
Inclusion criteria for NAFLD patients were as follows; (i) no history of alcoholism, (ii) no history for HBV/HCV/HIV infection, (iii) diagnosed by liver biopsy, (iv) information regarding age and BMI available. The sex of two samples was unknown, and was imputed from the results of the genome scan. As general Japanese population controls, the genome scan results of 942 healthy Japanese volunteers from Aichi Cancer Center Hospital and Research Institute were used [22].

Anthropometric and Laboratory Evaluation
We employed conventional methods for the measurement of anthropometry (height, weight, amount of visceral fat and abdominal circumscript). BMI was calculated from the measurements. The following biochemical/hematological/immunological traits were also measured by conventional methods; aspartate aminotransferase (AST), alanine aminotransferase (ALT), cglutamyl transpeptidase (GGT), albumin, total bilirubin, cholinesterase, type IV collagen 7S, hyaluronic acid, triglyceride, total cholesterol, hemoglobin A1c (HbA1c), fasting immunoreactive insulin (IRI), fasting plasma glucose (FBS), high sensitive CRP (hs-CRP), adiponectin, leptin, ferritin, uric acid, and platelet (PLT) count. Anti nuclear antibody (ANA) was measured by ELISA and categorized by the detection limit in a serial dilution as follows; 0: ,40x, 1: 40-80x, 2: 81-160x, 3: 160x, 4: .320x. Homeostasis model assessment-insulin resistance (HOMA-IR) was calculated from the measurements. Patients were assigned a diagnosis of diabetes mellitus (DM) when they had documented use of oral hypoglycemic medication, a random glucose level .200 mg/dl, or FPG .126 mg/dl. Hyperlipidemia was diagnosed with the cholesterol level being .200 mg/dl and/or triglyceride level being .160 mg/dl. Hypertension was diagnosed when the patient was taking antihypertensive medication and/or had a resting recumbent blood pressure §140/90 mmHg on at least two occasions.

DNA Preparation
Genomic DNA was extracted from peripheral blood mononuclear cells by standard phenol-chloroform extraction and resuspended in TE buffer. DNA concentration and purity were measured with Nanodrop 1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). The samples were stored at 220uC until use.

Genome-wide Genotyping and Quality Control
Genome scan was conducted for 543 patients with NAFLD and 942 healthy subjects using Illumina Human 610-Quad Bead Chip on a Bead Station 500G Genotyping System (Illumina, Inc., San Diego, CA, USA) and subjected to the following quality controls. Initially, ten patients and six control subjects were removed due to low call rates (,0.99). Regarding the SNP markers, 85,472 SNPs with minor allele frequency of smaller than 0.01 in either case or control group, 6,479 SNPs with lower success rates (,0.98) and 35 SNPs with distorted Hardy-Weinberg equilibrium (p,10 27 ) were removed, resulting in 484,751 SNP markers being used for analysis. Principal component analysis by EIGENSOFT [17] including phase II HapMap (http://hapmap.ncbi.nlm.nih.gov/) samples identified no samples that were deviated from the Japanese population. Subsequently, the degree of kinship between individuals was examined by pi-hat in PLINK 1.07 (http://pngu. mgh.harvard.edu/purcell/plink/) [23]. Of the eight pairs of samples (four case pairs and four control pairs) showing high degrees of kinship (PI-HAT.0.4), the sample with the lower call rate in each pair was removed. After these steps, 529 case and 932 controls were used for the analysis.

Statistical Analysis
A case/control association analysis was performed by exact trend test between NAFLD patients and control subjects [24]. The correction of obtained p-values for population stratification was performed using EIGENSTRAT [17]. In addition, an association between Matteoni classification (type1 to type4) and additive model of genotype for each SNP was examined using Jonckheere-Terpstra test for NAFLD patients. Assessment of population stratification of inflation of p-value was carried out by the genomic control method for asymptotic trend test [25]. Association between each quantitative trait and the genotype of significant SNPs in NAFLD patients were calculated by multivariable linear regression or multivariable ordinal regression adjusted for age, sex and BMI. Each quantitative trait was transformed as follows; natural log for ALT, AST, HOMA-IR, HbA1c, IRI, triglyceride, total bilirubin, adiponectin, hs-CRP, hyaluronic acid, leptin, reciprocal number for albumin, cholinesterase, type IV collagen 7S and square root for uric acid and ferritin. The values of FPG, PLT, total cholesterol, amount of visceral fat, and abdominal circumscript were not transformed. For each trait, values that were within only 4 S.D. were included for analysis. LD indices were calculated by default setting of Haploview [26] and the LD block was defined manually. Figure S1 QQ plot of the GWA study comparing distribution of the observed and expected p-values. Upper box is expressed in antilog scale and the lower box is expressed in -log 10 scale. The X-and Y-axis correspond to expected and observed p-values. Blue and red dots denote before and after correction by genomic control method (l = 1.04), respectively. (DOC)