Multi-SNP Analysis of GWAS Data Identifies Pathways Associated with Nonalcoholic Fatty Liver Disease

Non-alcoholic fatty liver disease (NAFLD) is a common liver disease; the histological spectrum of which ranges from steatosis to steatohepatitis. Nonalcoholic steatohepatitis (NASH) often leads to cirrhosis and development of hepatocellular carcinoma. To better understand pathogenesis of NAFLD, we performed the pathway of distinction analysis (PoDA) on a genome-wide association study dataset of 250 non-Hispanic white female adult patients with NAFLD, who were enrolled in the NASH Clinical Research Network (CRN) Database Study, to investigate whether biologic process variation measured through genomic variation of genes within these pathways was related to the development of steatohepatitis or cirrhosis. Pathways such as Recycling of eIF2:GDP, biosynthesis of steroids, Terpenoid biosynthesis and Cholesterol biosynthesis were found to be significantly associated with NASH. SNP variants in Terpenoid synthesis, Cholesterol biosynthesis and biosynthesis of steroids were associated with lobular inflammation and cytologic ballooning while those in Terpenoid synthesis were also associated with fibrosis and cirrhosis. These were also related to the NAFLD activity score (NAS) which is derived from the histological severity of steatosis, inflammation and ballooning degeneration. Eukaryotic protein translation and recycling of eIF2:GDP related SNP variants were associated with ballooning, steatohepatitis and cirrhosis. Il2 signaling events mediated by PI3K, Mitotic metaphase/anaphase transition, and Prostanoid ligand receptors were also significantly associated with cirrhosis. Taken together, the results provide evidence for additional ways, beyond the effects of single SNPs, by which genetic factors might contribute to the susceptibility to develop a particular phenotype of NAFLD and then progress to cirrhosis. Further studies are warranted to explain potential important genetic roles of these biological processes in NAFLD.


Introduction
Nonalcoholic fatty liver disease (NAFLD) affects almost a third of the adult population in North America [1].The clinicalhistologic spectrum of NAFLD ranges from nonalcoholic fatty liver (NAFL) to nonalcoholic steatohepatitis (NASH) [2].While NAFL progresses to cirrhosis in less than 5% of cases, NASH can progress to cirrhosis in 15-20% of cases [3][4].NAFLD is also a risk factor for the development of hepatocellular cancer which can develop with or without cirrhosis [5].It is therefore a public health priority to better understand the pathogenesis of the disease as well as factors that drive disease progression.
Recently, a single variant in PNPLA (rs738409; I148M) has been shown to be strongly associated with increased hepatic fat levels, inflammation and fibrosis [6][7].Since the discovery of the association between the PNPLA3 mutation and steatosis and steatohepatitis, several additional single nucleotide polymorphisms (SNPs) have been identified to be associated with NASH [7].However, despite these individual SNP associations, the biologic mechanisms that distinguish alternative clinical outcomes or disease progression are largely unknown.
Genetic analysis of biologic processes as opposed to analysis of individual SNPs may provide more insight into pathogenesis.The pathway of distinction analysis (PoDA) is a recently developed computational technique which tests for the association of variation within multiple genes involved in a defined biologic pathway with a given phenotype [8].This method may thus be used to investigate whether collections of constitutional genome variability within biologic processes determine the predisposition to develop steatohepatitis vs. steatosis or drive the progression to cirrhosis and hepatocellular cancer.Importantly, it identifies interactions between SNPs in driving a specific phenotype even when individually the SNPs may not be significantly related to the phenotype.
In this analysis, PoDA was performed on a genome wide association study dataset obtained from the NIDDK NASH Clinical Research Network (CRN) on 250 highly characterized adult female subjects with varying phenotypes of NAFLD [9].The specific objectives of the study were to determine whether biologic process variation measured through genomic variation of genes within these networks was related to the development of steatohepatitis or cirrhosis.We further evaluated the relationships of variation within these biological pathways with the severity of the individual histologic parameters of NAFLD.The results demonstrate the potential relationship of genomic variability within key biologic pathways that correlate with both the individual histologic features of NAFLD, the presence of steatohepatitis and progression to cirrhosis.

The Population Studied
The genome wide association study (GWAS) was conducted on 250 highly characterized adult subjects with varying phenotypes of NAFLD, a subset of patients who were enrolled into the NAFLD Database Study of NASH CRN [9].The database study was an observational cohort where no therapeutic interventions were undertaken.From this cohort, non-Hispanic, white, female adults were selected for the GWAS pilot study in order to reduce heterogeneity.The median age was 53 years (interquartile range: 46-60 years).The nature and clinical features of subjects in this cohort have also been published [2,9].
Single SNP association with the phenotype of NAFLD from this GWAS has already been published [9].This report represents an independent analysis of the GWAS dataset to identify genetic variations in biologic pathways associated with cirrhosis, NASH and the severity of the individual histologic features of NAFLD.The underlying assumption for the analysis was that inherited variations in genes in biologic pathways may determine the network functional status and thus the disease phenotype.The NAFLD Database study was approved by the IRB of each of the participating institutions of the NASH CRN.The GWAS was approved by the NASH CRN Steering Committee and was approved by the IRB at Cedars Sinai Medical Center, where the GWAS was performed.For each subject included in this analysis, detailed clinical information and data related to the liver histology was available.The histology was analyzed by the pathology committee of the NASH CRN and categorized using the NASH CRN scoring system as described previously [10].

Genotyping
Genotyping was performed with the use of Illumina Hu-manCNV370-Quadv3 BeadChips as described previously [9].Eight out of 250 samples were identified as outliers by principal component analysis (PCA) and were therefore removed with 242 samples remaining for the analysis.Additional filters applied to SNP data eliminated variants that did not show Hardy-Weinberg Equilibrium (P,1e-008) and minor allele frequency ,0.02; resulting in a total of 324,623 SNPs for the analysis.

Data Analysis
The Pathway of Distinction Analysis (PoDA) [8] was applied to the NAFLD genotype data for the following histologic phenotypes: steatohepatitis, NAFLD activity score (NAS) and its histologic components (steatosis, cytologic ballooning and lobular inflammation), fibrosis stage, and cirrhosis.Each of these phenotypes were analyzed as qualitative binary traits as described below: (1) Steatohepatitis: definite steatohepatitis (n = 56) vs. controls (steatohepatitis absent, n = 131).( 2 control (n = 204).The total number of samples analyzed is less than the full sample set because of incomplete clinical data on the cohort that does not allow a full analysis, and ''n'' reflects the total number of subjects on whom the detailed clinical, histological and GWAS data were available.
To assess the potential impact of population stratification generating non-disease related associations, the population was examined for all SNPs included in the analysis.Stratification analysis was performed on each of 7 phenotypes using Principal Component Analysis, which was implemented using singular value decomposition algorithm of R. No evidence of stratification was found in any of these 7 phenotypes used in the study.
The PoDA analysis was run systematically to the pathways represented in NCI/Nature Pathway Interaction Database (PID) [11].Associations between genes and SNPs were made using dbSNP build 129.A total of 95924 SNPs in the data could be associated with at least one of the pathways representing 4849 unique genes.The SNP showing the greatest magnitude of association with a given phenotype was selected to represent each gene for a given analysis.A total of 893 pathways were covered in the dataset with a minimum of 5 genes in each pathway.PoDA analysis tests for differences in variation between cases and controls by computing genetic distances based on the variation observed within and between each group.A distance score was computed in each pathway for each sample measuring that sample's distance to the remaining cases relative to its distance to the remaining controls for the collection of gene-based SNPs that constitute a given pathway.The distinction score (DS) quantifying the differential distributions of distance scores between cases and controls were then computed for each pathway.Significance [p(DS)] was assessed by resampling ''dummy'' pathways of the same length and computing the fraction of greater DS scores as described previously [8].Odds ratios [O.R.] were obtained by constructing a logistic regression model of case status as a function of S values which measures the sample's relative distance from the remaining ones.P-values were then adjusted for the multiplicity of pathways using FDR adjustment [q(O.R.)].

Results
The PoDA analysis was performed on the samples and SNPs that remained following the quality control processing of the original dataset from 250 subjects (see Materials and Methods for the details).The collections of SNPs in genes contained in Pathway Interaction Database (PID) [11] were examined for association with the following histological features of NAFLD that were recorded from Central Pathology Review: diagnosis of definite steatohepatitis, presence of cirrhosis, stage of fibrosis, grade of steatosis, ballooning and inflammation, and the NAFLD activity score (NAS) which is derived from the grade scores for ballooning, lobular inflammation, and steatosis.These histological parameters were either categorized to be absent or present; the severity of histological features that were scored was analyzed as a binary score as described in the Materials and methods section.

Biologic Pathways Associated with Definite Steatohepatitis
Several pathways contained collections of SNPs that when examined simultaneously within the pathway were significantly associated with the histologic diagnosis of ''definite steatohepatitis'' (Table 1).The top two pathways were ''Viral messenger RNA synthesis'' and ''Recycling of eIF2:GDP'' (p(DS) values ,0.001, and FDR adjusted odds ratio p-values of ,0.01).Several biosynthesis pathways, such as ''Terpenoid biosynthesis'', ''Cholesterol biosynthesis'', ''Pyrimidine biosynthesis'', ''Biosynthesis of steroids'', ''O-Glycan biosynthesis'', and ''Bile acid biosynthesis'', were also significantly associated with the diagnosis of definite steatohepatitis.The SNP collection in Cell cycle and p53 signaling pathways were also found to be associated with the diagnosis of steatohepatitis.The scatter plots of distance scores for two representative pathways are provided in Figure 1.The gene-based variations in two pathways previously associated with HCC in a large Korean cohort [8], ''Gamma-carboxylation, transport, and amino-terminal cleavage of proteins'' and ''Antigen processing and presentation'', were also associated with the diagnosis of definite steatohepatitis (p(DS) values = 0.034 and 0.037, respectively, with both showing FDR odds ratio adjusted p-values ,0.01).

Biologic Pathways Associated with Histologic Activity of NAFLD
We next investigated the relationship between pathway-based SNPs with NAS, and its independent components: steatosis, lobular inflammation and ballooning [10].For this analysis, a NAS score $5 was used to identify high disease activity.A high NAS was associated with the SNP variants in ''Glycoprotein hormones''  Of note, these pathways were not shown to be significantly associated with the diagnosis of definite steatohepatitis.Four pathways ''Vibrio cholerae infection'', ''Antigen processing and presentation'', ''no2dependent il-12 pathway in nk cells'' and ''ErbB signaling pathway'' previously described to be associated with HCC were also associated with a high NAS [8].
The association of individual components of NAS with biologic pathways using a cutoff of p(DS) value,0.001 was nxt examined.''Glycoprotein hormones'' and ''il12 and stat4 dependent signaling pathway in th1 development'' were observed to be associated with steatosis (Table S2).Of the previously described SNP variants in pathways associated with HCC, only ''growth hormone signaling pathway'' was associated with steatosis (p(DS) value = 0.002, FDR odds ratio adjusted p-value,0.01).''Hormone ligand-binding receptors'' pathway was associated with lobular inflammation (Table S3) and ''Terpenoid biosynthesis'' was observed to be associated with ballooning (Table S4).
Extension of this analysis to cirrhosis versus absence of cirrhosis (Stage 4 fibrosis) identified several pathways that were associated with cirrhosis (Table 2).Three pathways were observed to be associated with cirrhosis with a p(DS) value,0.001,FDR odds ratio adjusted p-value,0.01:''Il2 signaling events mediated by PI3K'', ''Mitotic metaphase/anaphase transition'', and ''Protanoid ligand receptors''.The scatter plots of distance scores for two representative pathways are provided in Figure 2. Of note, these pathways were associated with the presence of cirrhosis but were not associated to fibrosis alone.The pathways ''lectin induced complement pathway'' and ''signaling events mediated by stem cell factor (c-kit)'' were also found to be associated with cirrhosis (p(DS) -values = 0.004 and 0.037, respectively, both with FDR odds ratio adjusted p-values ,0.01) in previous HCC analysis [8].

Pathways Associated with Multiple Features of NAFLD
NAFLD covers a wide clinical-histologic spectrum, ranging from steatosis to NASH of varying grades of activity and stages of fibrosis.To facilitate the identification of components that contribute to composite phenotypes and to identify common underlying pathways across different liver histologic manifestations of NAFLD, the pathways significantly associated with 2 or more histologic features (p(DS) value,0.05) are listed in Table 3 with their genes listed in Table S6.Not surprisingly, many pathways associated with steatosis, lobular inflammation or ballooning was also observed to be associated with NAS $5.With NAS $5 included, there are 4, 7 and 51 pathways associated with 4, 3 and 2 histological findings/scores respectively.The SNP variants in ''Biosynthesis of steroids'' and ''Cholesterol biosynthesis'' are associated with definite NASH, lobular inflammation, ballooning and the NAS; ''Eukaryotic protein translation'' was associated with definite NASH, ballooning, NAS and cirrhosis; ''Terpenoid biosynthesis'' was associated with definite NASH, lobular inflammation, ballooning and fibrosis.In addition, ''Antigen processing and presentation'' was associated with definite NASH and the NAS while ''Recycling of eIF2:GDP'' was associated with definite NASH, ballooning and cirrhosis.

Discussion
One approach to genetic analysis of complex diseases such as NAFLD is to stratify the characteristics of histology and identify the biologic mechanisms underpinning each.While recognizing that NAFLD represents an interaction between environmental,   [7,12].However, in the original publication from which the data is used for the current study has failed to identify any relationship between PNPLA3 genetic variability and NAFLD, which may be a reflection of the modest sample size and highly select group of patients (non-Hispanic white female adults) [9].In the current study, PNPLA3 exists in the pathway ''1-and 2-Methylnaphthalene degradation'', which is significantly associated with steatohepatitis and ballooning.Importantly, these studies have been able to de-link the relationship between insulin resistance and the hepatic histologic ''phenotype'' of NAFLD, indicating that genetic factors play an important role in determining the ultimate disease phenotype and outcomes of NAFLD.
The analysis performed here extends the genetic analysis concept by finding biologic pathways whose constitutional variations have the potential to stratify disease categories, underpin key subcomponents, and determine alternative clinical outcomes.Pragmatically, the PoDA method used here accomplishes this by incorporating the influence of SNPs that may not be significantly related to a disease phenotype individually but when present in combination with other such SNPs within a biologic network may be determinants of disease phenotype and outcomes [8].Our study identifies several novel pathways, and implicitly gene-based SNPs, that appear to be associated with the presence and severity of several features of steatohepatitis.These data are primarily applicable to non-Hispanic white females because of the sample selection for this pilot GWAS study, therefore the generalizability of the conclusions remains an open question.Some of the key pathways linked to the presence of steatohepatitis and disease progression include those related to cholesterol synthesis and protein translation.These may be particularly germane given the central role of the liver in cholesterol homeostasis and the fundamental importance of regulation of protein translation to maintain cell viability.NASH is associated with accumulation of free cholesterol without a corresponding increase in cholesterol esters [13].This has recently been shown to be due to SREBP-2 driven transcriptional upregulation of HMG CoA reductase the rate-limiting enzyme for cholesterol synthesis [14].The upstream components of cholesterol synthesis are also components of the mevalonate pathway the only part of the terpenoid synthesis pathway that exists in humans [15].Several subcomponents of the terpenoid pathway and the cholesterol biosynthetic pathway e.g.farnesyland geranyl pyrophosphates affect cell proliferation and apoptosis [16].The current studies provide additional evidence that genetic factors that may modulate the activity of these pathways may affect activation of cell injury and apoptotic pathways which drive the development of steatohepatitis.
Another cellular process that has been implicated in the development of NASH and its progression is the unfolded protein response (UPR) [17].Inhibition of protein translation via phosphorylation of eIF-2a is a key step that relieves endoplasmic reticulum stress and increased eIF-2a phosphorylation has been seen in subjects with NASH [17].Also, numerous microRNAs that are differentially activated in NASH target eiF-2a [18].The identification of the protein translation pathway, of which eIF-2a is a key component, further corroborates the relevance of the protein translation machinery in the development of steatohepatitis and its progression to cirrhosis.It also indicates that susceptibility may be attributable to genetic variation within this pathway.Whether this occurs by altering miRNA expression and function, eIF structure and function or other mechanisms requires experimental elucidation.
It is also noteworthy that collections of SNPs in several cancerrelated biological pathways were also identified to be related to disease activity.These pathways have several overlapping components including k-Ras, Wnt-b catenin and multiple kinases involved in pro-inflammatory and cell proliferative pathways [19][20].These findings underscore the importance of the molecular pathways involved in cell proliferation and inflammation in defining the histologic activity of NAFLD and the susceptibility of these pathways to the genetic background of the individual.It is well known that cirrhosis is a risk factor for hepatocellular cancer and NASH-related cirrhosis is no exception to that rule [21][22].Recently, hepatocellular cancer has been identified even in the absence of cirrhosis in subjects with NAFLD [5].Our findings provide a rationale to further investigate the role of genetics in the development of HCC in such cases.

Conclusion
NAFLD is a complex biological state with multiple histological phenotypes and varied progression to cirrhosis.While several genes have been identified to be associated with these phenotypes, this study identified additional biologic processes whose genetic variation may underpin alternative phenotypes and determine outcome.It identifies genetic variation in genes within pathways that while not significantly related to disease phenotype individually, in combination are related to the development of steatohepatitis (the aggressive form of NAFLD), disease activity as defined by liver histology and its progression to cirrhosis.It also identifies potential key cellular pathways that may define genetically susceptible individuals.Several of these pathways are already closely related to the pathogenesis of NASH and disease progression.Taken together, the results provide evidence for additional ways, beyond the effects of single SNPs, by which genetic factors might contribute to the susceptibility to develop a particular phenotype of NAFLD and then progress to cirrhosis.Further studies are warranted to explain potential important genetic roles of these biological processes in NAFLD.

Figure 1 .
Figure 1.Two representative significant pathways in steatohepatitis.Scatter plots of distance score S for each pathway and overlayed with boxplots are given in the left panel; higher values of S indicate the sample is closer to other cases than it is to other controls.Distribution of S for cases (red) and controls (black) are given to the right.A. ''Viral messenger RNA synthesis'' -Reactome.B. ''mRNA splicing -Major pathway'' -Reactome.doi:10.1371/journal.pone.0065982.g001

Figure 2 .
Figure 2. Two representative significant pathways in cirrhosis.Scatter plots of distance score S for each pathway and overlayed with boxplots are given in the left panel; higher values of S indicate the sample is close to other cases than it is to other controls.Distribution of S for cases (red) and controls (black) are given to the right.A. ''IL2 signaling events mediated by PI3K'' -NCI-Nature.B. ''Lectin induced complement pathway'' -BioCarta.doi:10.1371/journal.pone.0065982.g002

Table 1 .
Pathways significantly associated with NASH diagnosis.

Table 2 .
Pathways significantly associated with Cirrhosis.

Table 3 .
Pathways significantly associated with 2 or more phenotypes.