Genome-Wide Analysis of Copy Number Variation Identifies Candidate Gene Loci Associated with the Progression of Non-Alcoholic Fatty Liver Disease

Between 10 and 25% of individuals with non-alcoholic fatty liver disease (NAFLD) develop hepatic fibrosis leading to cirrhosis and hepatocellular carcinoma (HCC). To investigate the molecular basis of disease progression, we performed a genome-wide analysis of copy number variation (CNV) in a total of 49 patients with NAFLD [10 simple steatosis and 39 non-alcoholic steatohepatitis (NASH)] and 49 matched controls using high-density comparative genomic hybridization (CGH) microarrays. A total of 11 CNVs were found to be unique to individuals with simple steatosis, whilst 22 were common between simple steatosis and NASH, and 224 were unique to NASH. We postulated that these CNVs could be involved in the pathogenesis of NAFLD progression. After stringent filtering, we identified four rare and/or novel CNVs that may influence the pathogenesis of NASH. Two of these CNVs, located at 13q12.11 and 12q13.2 respectively, harbour the exportin 4 (XPO4) and phosphodiesterase 1B (PDE1B) genes which are already known to be involved in the etiology of liver cirrhosis and HCC. Cross-comparison of the genes located at these four CNV loci with genes already known to be associated with NAFLD yielded a set of genes associated with shared biological processes including cell death, the key process involved in ‘second hit’ hepatic injury. To our knowledge, this pilot study is the first to provide CNV information of potential relevance to the NAFLD spectrum. These data could prove invaluable in predicting patients at risk of developing NAFLD and more importantly, those who will subsequently progress to NASH.


Introduction
Non-alcoholic fatty liver disease (NAFLD) has emerged as a silent epidemic, with its worldwide prevalence continuing to increase with the growing incidence of obesity [1]. NAFLD comprises a spectrum of diseases ranging from simple steatosis, which is essentially benign fatty infiltration of the liver, to its inflammatory counterpart non-alcoholic steatohepatitis (NASH) [2]. The pathogenesis of NAFLD is based on the ''two hit hypothesis'' [3]. The ''first hit'' is the development of steatosis and involves the accumulation of triglycerides in the liver due to insulin resistance. Insulin resistance prepares the hepatocytes for the second insult. The ''second hit'' is often due to adipocytokines and oxidative stress, which further damage the liver thereby promoting progression to steatohepatitis and fibrosis. A significant proportion of individuals with NAFLD develop hepatic fibrosis, a key feature of the condition which is associated with progression of the disease to cirrhosis and its related complications, including hepatic failure and hepatocellular carcinoma [4]. The fibrotic progression of NAFLD is identified histologically by the presence of NASH. A high prevalence of NASH is found among those with insulin resistance-related comorbidities such as obesity and type 2 diabetes [5]. The mortality rate among NASH patients has been found to be much higher than for patients with simple fatty liver (simple steatosis) [6].
In addition to environmental factors such as high calorific food intake and a sedentary lifestyle, there is mounting evidence of a genetic component to the complex etiology of NAFLD [7]. This is reflected by marked differences in the prevalence of NAFLD across diverse populations [8][9]. The high heritability of NAFLD was evident in a familial aggregation study, with estimates of 59% in siblings and 78% in parents with NAFLD [10]. Until recently, genome-wide association studies (GWAS) and the candidate gene approach have both utilised single nucleotide polymorphisms (SNPs) to explain the genetic component of NAFLD [7,[11][12].
The wide distribution of copy number variants (CNVs) in the human genome has underscored the importance of CNVs in relation to genetic diversity, phenotypic variability and disease susceptibility [13][14]. It has been estimated that approximately 12% of the human genome is copy number variable [15] with over 1000 genes having been mapped within or close to regions that are affected by structural variation [16]. A global increase in CNV burden has also been observed in polygenic traits such as schizophrenia [17], autism [18] and attention deficit hyperactivity disorder [19]. Given these findings, the sheer scale of CNVs means that they are likely to make a significant contribution to the 'missing heritability' of some of these conditions [20]. However, despite some success in identifying CNVs responsible for metabolic phenotypes including obesity and diabetes mellitus [21][22], there are as yet no data available to suggest whether or not CNVs might be involved in the etiology of the NAFLD spectrum.
Here, we describe a pilot study designed to detect rare or novel CNVs associated with NAFLD and/or NASH. Predicting NASH non-invasively is very important since this condition is potentially progressive and liver biopsy is currently the gold standard for the diagnosis of NASH. We interrogated the CNVs associated with NASH and ascertained the biological processes associated with those genes covered by the CNVs in order to assess their possible role in the progression of the disease. To this end, we used a highresolution Agilent aCGH platform to perform genome-wide copy number analysis in patients with both simple steatosis and NASH, which are representative of the clinical spectrum of NAFLD.

Ethics Statement
The study protocol was approved by the Medical Ethics Committee of UMMC and all subjects provided their written informed consent to participate.

Subjects
Genome-wide copy number profiling was performed using array comparative genomic hybridization (aCGH) on a total of 49 NAFLD patients (39 with NASH and 10 with simple steatosis) and 49 fatty liver-free controls that were matched both for age and gender. All subjects were, as far as could ascertain, genetically unrelated to each other. All NAFLD patients were consecutively recruited from the University of Malaya Medical Centre (UMMC). NAFLD was confirmed through liver histology and evaluated according to the NASH Clinical Research Network criteria [23][24]. All liver biopsy specimens were on average 1.5 cm long and contained at least six portal tracts. Subjects were excluded if they met any of the following criteria: (i) alcohol consumption .10g/day [25]; (ii) hepatitis B or C infection; (iii) autoimmune hepatitis; (iv) exposure to drugs known to cause steatosis or (v) Wilson's disease. The controls were genetically unrelated healthy subjects with a body mass index (BMI) ,25 kg/m 2 , a fasting plasma glucose of ,110 mg/ dL, a normal lipid profile and normal liver enzymes. NAFLD was actively excluded in the controls by ultrasonography according to the absence of the following criteria: (i) slight diffuse increase in bright homogeneous echoes in the liver parenchyma with normal visualization of the diaphragm and portal and hepatic vein borders, and normal hepatorenal echogenicity contrast; (ii) diffuse increase in bright echoes in the liver parenchyma with slightly impaired visualization of the  peripheral portal and hepatic vein borders; (iii) marked increase in bright echoes at a shallow depth with deep attenuation, impaired visualization of the diaphragm and marked vascular blurring [26]. Subsequent magnetic resonance imaging (MRI) to further confirm the fatty liver free status was performed.

Array CGH
Array-CGH was performed according to the protocol established by the manufacturer (Oxford Gene Technology, Begbroke, UK). It was carried out using the SurePrint G3 Human CGH 26400 K array (Agilent Technologies, Santa Clare, CA, USA) for genome-wide identification of putative disease-associated CNVs. Each oligonucleotide-based microarray slide contained 410,739 probes that enabled the profiling of molecular genomic imbalances with a mean resolution of 5.3 kb. Probes on the array were 60-mers and covered both coding and non-coding regions of the human genome. A total of 1.0 mg genomic DNA from patients and controls was labeled with Cy3 and Cy5 dyes respectively using the CytoSure Genomic DNA labeling kit (Oxford Gene Technology). Probes were then purified using Microcon Centrifugation Filters, Ultracel YM-30 (Millipore, Billerica, MA, USA) and mixed thoroughly. This was followed by denaturation and preannealing with 50 mg human Cot-1 DNA (Invitrogen, California). Hybridization of the mixture to the array slide was executed at a constant rotation at 65uC for 40 hours. The slide was then washed with Agilent wash buffers 1 and 2, and scanned immediately using an Agilent Microarray scanner (Agilent Technologies, Santa Clara, CA, USA). Data were extracted from scanned images using Feature Extraction Software, version 10.7.3.1 (Agilent Technologies, USA). The raw data obtained thereafter were uploaded into the CytoSure Interpret software version 4.2.5 (Oxford Gene Technology), normalized and converted into.cgh files. Data normalization software was used to improve inconsistencies in dye incorporation. The data were segmented using a modified Circular Binary Segmentation (CBS) algorithm [27]. Genomic aberrations were identified by applying log2 intensity ratios of sample to reference (Cy3/Cy5: log2-ratios above 0.3 for duplications and below 20.6 for deletions). Chromosomal aberrations were reported in accordance with the human genome sequence assembly Build 37, hg 19 (http://www.ncbi.nlm.nih.gov). The microarray data have been deposited in the Gene Expression Omnibus (GEO) database (accession number 55645).

CNV Calling and Functional Enrichment Analysis
CNVs were called for the segments with at least 5 consecutive probes. Rare CNVs were defined as those which overlapped by ,50% with reported CNVs from the Database of Genomic Variants (DGV; http://dgv.tcag.ca/dgv/app/ home). CNVs were deemed to be novel if they did not appear in the DGV database. Gene content within the identified CNVs was retrieved from the Homo sapiens (GRCh27) assembly using the Biomart-Ensembl (http://www.ensembl.org). By default, the lists contained both gene and non-gene entities; the latter were removed through a process of cross-checking and verification of gene symbols using the HUGO Gene Nomenclature Committee (HGNC) database (http://www.genenames.org/). To investigate the functional impact of rare and/or novel CNVs, the Database for Annotation, Visualization and Integrated Discovery (DA-VID; http://david.abcc.ncifcrf.gov/) was utilised to assess the Gene Ontology (GO; http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (http://www.genome.jp/kegg/) annotations between the genes encompassing the rare and/or novel CNVs and the genes associated with NAFLD. The list of genes associated with NAFLD was identified using the MalaCards database (http:// www.malacards.org/) -an integrated searchable database of human disease states and their annotations, in association with the GeneCards relational database. Initially, a total of 200 genes associated with NAFLD were identified. Since the gene-disease association was based on a text mining algorithm, a manual verification of the biological processes associated with each of the 200 genes was performed. Only genes that had previously been described as being associated with NAFLD by either expression studies, genotyping or protein array work were selected, thereby lowering the number of genes implicated in NAFLD to 70 (see Table S1 for the complete list of genes).

Quantitative PCR Validation of CNV Calls
A duplex TaqMan real-time quantitative polymerase chain reaction (qPCR) was performed to validate the CNV regions using a Step One Plus (Applied Biosystems) on three of the samples from two selected regions (11q11: Assay Hs02799097_cn, and 13q12.11: Assay Hs03857719_cn). Each reaction (20 mL) contained 10 mL master mix, 1 mL TaqMan Copy Number Assay, 1 mL TaqMan Copy Number Reference Assay, 4 mL nuclease free water, and 4 mL 5 ng/mL genomic DNA, and was run in quadruplicate. The PCR cycling conditions consisted of 1 PCR cycle at 95uC for 10 min, followed by 40 cycles at 95uC for 15 sec and 60uC for 1 min.

Results
In the aCGH method adopted, DNA samples pooled from multiple subjects (patients and controls) were cross-compared so as to remove/normalise any common copy number changes in the normal control sample. Since the principle of aCGH is to compare the DNA copy number from patient samples against those of normal controls, CNV calls were designed to be patient-specific.

Subjects and Identification of CNVs in the NASH Genome
All DNA samples passed quality control (QC) after a rigorous sample preparation process and a QC check during sample processing. Sets of 39 NASH samples and 39 matched controls were run in parallel on an array CGH platform that allowed the ratio of DNA copy number between a test (patient) and a reference (control) to be simultaneously assessed. From a total of 39 samples, 51.3% (n = 20) were females, 48.7% (n = 19) were males; the mean age of the 39 subjects was 50.4 years. The histopathological data are presented in Table 1. Seven percent of CNV calls were attributable to the sex chromosomes (with a frequency of at least 10%), but we opted to exclude these chromosomes from further analysis owing to the evolutionary biases due to small imbalances of the sex chromosomes [28]. Analysis of copy number variants, on the basis of log ratio and probe incidence filtering, yielded a total of 267 autosomal CNVs (the ratio of the fluorescence intensities between the patients and controls is a measure of the relative DNA copy number), amounting to an average of 6.84 autosomal CNVs per individual. The 267 CNVs detected spanned between 5.77 kb and 8.15 Mb in size, with a mean size of 194.94 kb and a median size of 38.33 kb, covering a total of 52.05 Mb or 1.63% of the genome (Fig. 1). Most chromosomal arms harboured both copy number gains and losses, but copy number gains were more commonly observed than losses (estimated ratio of 1.7:1). However, only 55 CNVs (20.6%) out of the 267 CNVs detected had a frequency of .10%.
Molecular genomic profiling identified 14q11.2 as the most frequently amplified region, which occurred in 53.8% of the NASH samples and contained a clutch of olfactory receptor (OR) family genes ( Table 2; see Table S2 for the full list of OR genes). The most frequently deleted genomic region in the NASH samples, 12p13.2, is enriched in taste receptor (TASR) family genes (see Table S2 for the full list of TASR genes), and exhibited similar frequencies of losses and gains (38.5%) suggesting a generally unstable region. Several other frequently deleted regions were also observed including one at 16q12.2 harbouring the carboxylesterase 1 (CES1) gene and one at 14q24.3 spanning the acyl-CoA thioesterase 1 (ACOT1) gene; importantly, both genes are known to promote hepatic steatosis via the action of regulation of hepatic lipid metabolism [29][30]. There were nine CNVs present in at least 33% of the samples whilst only one was present

Integrative Analysis of CNVs and Functional Enrichment to Identify Candidate Genes for Involvement in NASH
To identify unique CNVs in NASH patients that could be involved in the pathogenesis of this condition, we performed a cross-comparison with known CNVs from the DGV database. Conservative assessment of the overlap between reported CNVs from the DGV database with the CNVs identified in this study revealed four rare and/or novel CNVs (DGV coverage ,50%) that were present in at least 10% of the NASH samples ( Table 3). Two of these CNVs were classed as rare (DGV coverage ,50%: 12q24.33 and 13q12.11), whereas the other two were novel (DGV coverage 0%: 21p11.1-11.2 and 12q13.2). A Chi-square test confirmed the significance of the association of these CNVs with NASH (P,0.05) as compared to simple steatosis. To further assess the likelihood of the involvement of these CNVs in NASH, the genes located within these regions were identified and their involvement in those biological processes shared with known NAFLD genes assessed. First, we profiled the genes within the chromosomal regions that are bounded by the four rare and/or novel CNVs, where genes such as exportin 4 (XPO4) and phosphodiesterase 1B (PDE1B) are located. A list of genes known to be associated with NAFLD was then obtained (see Table S1). Subsequently, we performed GO enrichment and KEGG pathway analysis using the DAVID gene annotation tool for the two sets of genes (genes within the four unique regions and known genes associated with NAFLD). We observed a number of shared biological processes (Table 4) between the two sets of genes including those that could be linked to NAFLD progression such as glucose metabolism, cell surface receptor-linked signal transduction and cell death [3].

Identification of CNVs in the Simple Steatosis Genome
Given the greater number of NASH samples (,80%) and the progressive nature of NASH (about one third of NASH patients tend to develop cirrhosis over a 5-10 year period; by contrast, simple steatosis patients tend to be clinically stable over time) [31] in the disease spectrum, the main focus of this study was placed on NASH. However, we were also interested in understanding the progression of simple steatosis to NASH. Unfortunately, we were only able to obtain DNA samples from 10 simple steatosis patients and 10 fatty-liver free controls. Seven of the samples were male and the mean age (all samples) was 47.9. The histopathological data are shown in Table 1. A total of 56 CNVs (simple steatosis patient-specific) were identified, including three (5.4%) which were located on one of the sex chromosomes. All CNVs were present with a frequency of at least 10%. Fifty-three autosomal CNVs were selected for further analysis. Of these, 11 were unique to simple steatosis whereas 42 were found to be shared with NASH. The former 11 CNVs could conceivably play a role in the development of hepatic steatosis, whereas the latter 42 CNVs could be involved in progression to steatohepatitis. Intriguingly, the four rare and/or novel CNVs identified earlier in NASH patients were not found in simple steatosis patients, and remain unique to NASH.
The top scoring regions in terms of copy number gains and losses in simple steatosis are listed in Table 5. The most commonly amplified region, 12p13.31 (50%), was also among the most highly amplified regions observed in NASH patients. A CNV at the 10q11.22 locus that occurred in 40% of the simple steatosis samples contains the neuropeptide Y receptor 4 (NPYR4) gene, which is known to be important in obesity through the regulation of appetite and energy metabolism [32]. Three CNVs (located at 4q13.2, 15q11.2 and 11q11) shared the most deleted region at a frequency of 40%, in which two of the CNVs (4q13.2 and 11q11) were also among the most highly deleted regions observed in NASH. These CNVs were enriched for OR genes (11q11) and immunoglobulin heavy chain (IGH) (15q11.2) family genes (Table 4; see Table S2 for the full list of OR and IGH genes). However, all CNVs identified in the simple steatosis patients were common (DGV coverage 100%).

qPCR Validation
We validated three samples (each is patient-control matched pair) for each CNV region identified. We selected two CNV regions that represented different statuses of copy number change (the CNV at 13q12.11 was a copy number gain and was rare, 11q11 was a copy number loss). All CNVs were confirmed through qPCR validation. Amplifications and deletions of the genomic regions were defined on the basis of differences between patient's copy number and the wild-type copy number (i.e. a copy number around 2). Fig. 2 illustrates the qPCR results of the validated CNVs.

Discussion
Studies on CNVs are becoming increasingly important in studies of inherited disease, with growing evidence attesting to the substantial impact that they can have on human phenotypic variability and genetic susceptibility. Here, we present a pilot analysis of CNVs in a series of NAFLD patients. We identified four CNVs that are either rare or novel to NASH patients in our study that could potentially contribute to clinical outcome.
In patients with NASH, the most frequently amplified region was 14q11.2, which is enriched in OR family genes, while an abundance of TASR family genes were found at 12p13.2, the most frequently deleted region. Although the OR and TASR families play roles in the olfactory and gustatory systems respectively, a search of the database of Expressed Sequence Tags, NCBI dbEST, revealed OR and TASR gene expression in many tissues and organs, including the liver. Impairment of olfactory and gustatory function has been reported in chronic liver disease including cirrhosis; chemosensory function however improved after liver transplantation [33]. In the early 2000s, a comprehensive database of the human olfactory subgenome was completed using a highly automated data mining system [34]. Glusman et al. (2001) reported the presence of 906 potential coding regions for OR genes that cover almost all human chromosomes with the exception of chromosomes 20 and Y, in which 2/3 of the regions have not been reported. Subsequently, new databases termed respectively the Olfactory Receptor Microarray Database (ORMD) which includes microarray gene expression data from the ORs [35], and the Database of Chemosensory Receptor Gene Families (CRDB), were developed [36]. The size of these databases highlights the importance of OR and TASR gene families not only in the olfactory and gustatory systems, but also in tissues and organs throughout the body.
A deletion CNV was noted at the 16q12.2 locus; it includes the CES1 gene, which is primarily important in the metabolism of fatty acids and cholesterol [37]. Expression of CES1 has been found to be higher in human NAFLD hepatic tissue as compared to non-NAFLD [38]. A role for CES1 in lipolysis was evidenced by a positive correlation between CES1 expression and triglyceride lipase activity as well as with adiposity [30]. On the other hand, CES1 knockout mice are characterized by a gain in weight, hepatic steatosis and hyperinsulinemia, thereby supporting a role for CES1 in the regulation of fatty acids [37]. Interestingly, the 16q12.2 locus is known to harbour genetic variants (SNPs) associated with BMI [39]. Although CES1 has been implicated in hepatic steatosis Table 5. Top regions of copy number gains and losses in simple steatosis. [37], a recent study has shown that CES1 may have potential as a biomarker to distinguish hepatocellular carcinoma (HCC) from cirrhosis [40]. Also notable among the highly deleted regions in NASH patients was a copy number loss at the 14q24.3 locus, where the acyl-CoA thioesterase 1 (ACOT1) gene resides. Acyl-CoA thioesterase 1 promotes the cellular balance between free fatty acids and acyl-CoAs to maintain cellular processes including lipid metabolism [29]. Compared to other ACOT subfamily genes, ACOT1 is unique in that it is highly expressed only in association with a high fat diet but not in association with a normal diet [41]. Although determining the CNV frequencies and their gene content are important, most of the CNVs detected here are considered to be common (DGV coverage 100%) and hence may have little or no impact on the pathogenesis of NASH. The definition of 'common' here is however debateable given that reported CNVs from the DGV are (i) from non-NAFLD studies and (ii) unlikely to be from the Malay population. Caution should therefore be exercised when offering functional interpretation of these CNVs until more comprehensive studies on larger numbers of patients are conducted. To achieve our main goal in this pilot study, which was to identify candidate CNV loci that could play a role in the etiology of NASH, we filtered out common CNVs and identified four CNVs (DGV coverage ,50%) that have the potential to be involved in the pathogenesis of NASH; two of these are rare (12q24.33 and 13q12.11) whilst two are novel (12q13.2 and 21p11.1-11.2). We were able to establish the potential significance of these loci by performing a Chi-square test against other loci and validating the findings by qPCR; in this way, we were able to confirm that, despite the relatively small sample size, our analysis has the potential to yield biologically meaningful and reproducible results. We postulate that these CNVs could provide new insights into the biology of NASH. Of particular note was an aberration at the 13q12.11 locus that could serve as a potential copy number biomarker for NASH. This region contains the tumor suppressor gene exportin 4 (XPO4), the inactivation of which promotes HCC in mice [42]. On the other hand, increased expression of XPO4 in human HCC is associated with better prognosis and a better survival rate [43][44]. The phosphodiesterase 1B, calmodulin-dependent (PDE1B) gene spanning the 12q13.2 region is important in many signal transduction pathways, and has been found to be downregulated in cirrhotic liver [45]. The 12q13.2 locus was identified as a clear-cut amplification (no deletion event), thereby supporting its candidacy as a potential risk marker CNV associated with the disease. However, there are a limited number of published reports on XPO4 and PDE1B and their putative role in liver disease. Thus, additional comprehensive studies focussed on these two genes will be necessary to confirm or refute this finding.
To assess the plausibility of our results, it was important to verify the functional role of these CNVs (rare/novel) and their potential impact on NASH. In order to explore the possible association between these CNVs and NASH, we extended our analysis to GO functional enrichment and KEGG pathway analysis for genes residing at these CNV loci and known NAFLD genes. The results yielded several shared biological processes between the two sets of genes. Of primary importance are glucose metabolism and cell surface receptor-linked signal transduction and cell death, all of which have been shown to be important in the pathogenesis of NASH [4]. However, no related KEGG pathway was observed.
As for simple steatosis, the most frequently amplified region (12p13.31) also happened to be among the most highly amplified regions in NASH. The 10q11.22 region, which harbours a CNV that occurred in 40% of the simple steatosis samples, contains the NPYR4 gene. This gene is involved in the regulation of appetite and energy metabolism [32]. The pancreatic peptide, a high affinity ligand for the neuropeptide Y receptor 4 (Y4), has been suggested to have anti-obesity potential [46][47]. Long term antagonism of Y4 causes significant reduction in body weight and adiposity via effects on metabolic rate and energy distribution [48].
We readily acknowledge the small number of simple steatosis samples in the present study. This limitation was due to the lack of availability of simple steatosis patients from our previous study that comprised three major ethnic groups [49][50]. These patients were recruited from the UMMC, a tertiary referral center, which could explain the greater number of NASH patients as compared to those with simple steatosis. In order to minimise ethnicity as a potential confounding factor, we selected samples taken from only one specific ethnic group, namely the Malays, for both the NASH and the simple steatosis group. Under these conditions, the number of simple steatosis patients that we were able to obtain was only 10. Despite the limited numbers of patients available, several of our findings were statistically significant. Importantly, chromosome 11q11 which was one of the most frequently deleted CNVs in our study, was also frequently deleted from 10 hepatic steatosis patients from the study by Royo et al. [51]. It should be noted that the Royo et al. study did not include any NASH patients. This notwithstanding, our pilot study was designed to provide an initial screen of the structural genomic aberrations present in NAFLD samples. Simple steatosis patients mostly presented with either a copy number gain or a loss event at one locus, unlike the NASH group which tended to exhibit both events. In addition, a greater number of CNVs were identified in the NASH group as compared to the simple steatosis group. This could be explained by the complex pathogenesis of NAFLD especially at the NASH stage, involving not only the 'first hit' mechanism but also the 'second hit' [4]. In this study, we were mainly concerned with identifying CNVs that were common to both simple steatosis and NASH, particularly when the CNV frequency was higher in NASH than in simple steatosis (n = 2), as they could indicate involvement in the progression of the disease. Surprisingly, histological data from the samples harbouring these CNVs (12p13.2 and 11p15.4) showed a higher frequency (53.3% and 71.4% respectively) of fibrosis score $2, thereby supporting the disease progression model.
The ethical issue that precluded the use of liver biopsy for the classification of controls (non-NAFLD) required us to adopt a stringent definiton of controls in order to rule out fatty liver in the control subjects; biochemical tests, ultrasonography and MRI evaluations were therefore used to minimise misclassification of our controls. To the best of our knowledge, this is the first study to investigate a genome-wide profile of copy number variation in the NAFLD spectrum; hence, determination of the CNV total number, frequency, genomic location and gene content, is challenging. The use of aCGH technology allows CNV discovery at high resolution and hence allows confidence in CNV detection. The use of 60mer probes provides high sensitivity and specificity to accurately detect both known and de novo CNVs as compared to shorter oligonucleotide probes [52]. The source of genes known to be related to NAFLD was Malacard, which is known to use a textmining approach [53]. Hence, a manual verification of the gene functions was performed that included only genes that have been shown to be associated in either expression studies, genotyping or protein array work. However, we cannot rule out the possibility that other genes could be of importance in NAFLD, as more comprehensive studies are still ongoing. Indeed, it was also difficult for us to assess the significance of such CNVs given that multiple genes often reside within the CNV intervals. We attempted to overcome this limitation by performing a functional enrichment analysis that covered all the genes residing within the CNV regions.
Taken together, the results of our whole genome copy number analysis have documented four rare and/or novel CNV loci that are unique to NASH, and to the best of our knowledge, have not previously been reported. This study nevertheless falls into the hypothesis generating category rather than the hypothesis testing category; hence, our results remain to be substantiated by additional studies on larger patient groups. Moreover, additional functional studies on the genes residing within these loci will be needed to fully characterize the function of the genes and their relationship, if any, to NASH.