Integrative Genomic Analysis Identifies That SERPINA6-rs1998056 Regulated by FOXA/ERα Is Associated with Female Hepatocellular Carcinoma

The human forkhead box A1 (FOXA1) and A2 (FOXA2) transcription factors have been found to control estrogen and androgen signaling through co-regulating target genes with sex hormone receptors. Here we used an integrative strategy to examine the hypothesis that genetic variants at FOXA1/2 binding elements may be associated with sexual dimorphism of hepatocellular carcinoma (HCC) risk. Firstly we extracted chromatin immunoprecipitation-sequencing (ChIP-seq) data of FOXA1, FOXA2 and estrogen receptor 1(ERα) from ENCODE database to obtain dual target regions of FOXA/ERα, and further intersected these regions with genes’ promoters. Then we used MATCH program to predict FOXA binding elements, in which genetic variants were retrieved by dbSNP database (NCBI, build 134). A total of 15 candidate variants were identified in this stage. Secondly we performed a case-control study with 1,081 HCC patients and 2,008 matched controls and found a significant association of SERPINA6-rs1998056 with female HCC risk under common genetic models (e.g. GG versus CC: OR = 2.03, 95% CI = 1.26–3.27, P = 0.004). Moreover, results from our real-time quantitative polymerase chain reaction (qPCR) using 72 normal liver tissues adjacent to the tumors showed that SERPINA6 expression was significantly different among different genotypes of this variant (GG versus CC: P = 0.032; Group test: P = 0.060). In summary, our study suggested that SERPINA6-rs1998056 regulated by FOXA/ERα might be associated with female HCC risk.


Introduction
Hepatocellular carcinoma (HCC) is one of the most frequently diagnosed malignancies. In the worldwide, it is the fifth common cancer in men and the seventh in women, with an estimated 748,300 new cases and 695,900 deaths in 2008 [1]. More than half of these suffers were considered to occur in China [2]. Previous epidemiology studies have revealed several environmental hazards, such as chronic hepatitis B or C virus infection, aflatoxin B1 intake, obesity and alcoholism [2][3][4][5][6]. But only a few exposed individuals actually develop HCC during their lifetimes, suggesting that genetics also play an important role in the pathogenesis of this disease. Interestingly, HCC demonstrates the significant difference of incidence between men and women [1]. Although different lifestyles and environmental exposures can explain a part of sexual dimorphism of HCC, biological inequality between men and women also derives from subtle regulations of genetic factors and demonstrates a variety of hormone-related conditions on health and disease.
The human forkhead box A 1 (FOXA1) and forkhead box A2 (FOXA2) are two members of the FOX gene family that encode a set of transcription factors playing important roles in controlling cell growth, proliferation and differentiation [7]. Accumulative studies have indicated that FOXA1 and FOXA2, as well as their orthologous members Foxa1 and Foxa2 in mice, are associated with several tumors, especially with the hormone-dependent cancers [8][9][10][11][12][13]. It is generally considered that the recruitment of estrogen receptor 1 (ERa) or androgen receptor (AR) to their target genes depend on the modulation of FOXA1 in breast or prostate cancer, respectively, and FOXA factors may be essential in estrogen and androgen signaling in hormone-dependent malignancies [14][15][16][17]. HCC is a sexually dimorphic disease related to sex hormone too. Recently, Li Z et al. identified the similar and central role of Foxa1 and Foxa2 in regulating estrogen and androgen signaling through recruitment of ERa and AR to their targets in the carcinogenesis of liver, thereby providing an explicit explanation for sexual dimorphism of HCC [18]. Nevertheless, the association between genetic variants at FOXA binding elements and the development of HCC was implied by their preliminary results in several human samples [18]. To further investigate the genetics of sexual dimorphism contributing to HCC risk, large-scale association studies are warranted.
Informatively, we used an integrative genomic strategy to examine the hypothesis that genetic variants at FOXA1/2 binding elements likely relate to sexual dimorphism of HCC risk. In the stage of bioinformatic data mining, we extracted chromatin immunoprecipitation-sequencing (ChIP-seq) data of FOXA1, FOXA2 and ERa from online databases to obtain dual target regions of FOXA/ERa, and further intersected these regions with genes' promoters. Then we used MATCH program to predict FOXA binding elements, in which candidate variants were retrieved by dbSNP database (NCBI, build 134). In the stage of case-control study, 1,081 patients with newly diagnosed HCC and 2,008 matched tumor-free controls were enrolled to examine the effects of selected candidate variants on HCC risk. Finally in the stage of functional experiment, a real-time quantitative polymerase chain reaction (qPCR) was carried out to evaluate the effect of rs1998056 on SERPINA6 expression using 72 normal liver tissues adjacent to the tumors with different genotypes.

Identification of candidate variants
Two import results from Li Z et al. [18] were applied to our study: (1) Foxa1 and Foxa2 conduct the redundant regulation in liver and there are no significant differences of either Foxa1 or Foxa2 binding between males and females. That was the theoretical basis for us to combine ChIP-seq data of FOXA1 and FOXA2 from HepG2 cell line. (2) The vast majority of the dual targets (76% of Foxa/ERa and 64% of Foxa/AR) are overlapped between males and females. So we not only examined candidate variants in total population, but also detected their effects in both men and women.
The schematic overview of identifying candidate variants at FOXA1/2 binding elements is shown in Figure 1. First, ChIP-seq data of FOXA1, FOXA2 and ERa were downloaded from the Encyclopedia of DNA Elements (ENCODE) Project (http:// genome.ucsc.edu/ENCODE/downloads.html) to extract corresponding sites of these transcription factors (TFBSs). Specifically, genome-wide TFBSs of FOXA1and FOXA2 were extracted from HepG2 cell line; genome-wide TFBSs of ERa were extracted from ECC-1 and T47-D cell lines due to lacking of ERa ChIP-seq data of HepG2 cell line. For the replicated experiments in the same cell line, overlapping peak locations were selected; for the independent experiments in different cell lines, peak locations were put together for further analysis. In consideration of the redundant regulation of FOXA1 and FOXA2 in liver [18], we combined the ChIP-Seq results of FOXA1 and FOXA2 together. Thus we got two TFBS datasets of FOXA (71,395 peaks) and ERa (28,828 peaks). Second, dual target regions of FOXA/ERa were identified as follows. Each TFBS genome location of FOXA was compared with that of ERa. If the distance between FOXA TFBS and ERa TFBS was less than 250 bp, we defined the corresponding regions as dual target regions. In this step, we obtained a total of 8,012 dual target regions of FOXA/ERa. Because the TFBSs within or near gene's promoter regions are generally considered more likely to be real binding sites, the third step was to intersect dual target regions of FOXA/ERa with gene's promoter regions to find the potentially functional dual target regions. We downloaded all genes' genomic locations from the University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) and then defined their upstream 5 kb regions as genes' promoter regions. By mapping the locations of these TFBSs to genes' promoters, a total of 640 potentially functional dual target regions of FOXA/ERa were identified. Specially, these 640 dual target regions fully or partially overlapped with promoter regions. To further narrow down the dual target regions, we further fine mapped the TFBSs of FOXA and ERa in their dual target regions by MATCH program, which can finely predict TFBSs based on the positional weight matrices from the TRANSFAC database [19]. The TFBSs predicted by MATCH program were defined as transcriptional factor binding elements (TFBEs) in our study. Thus we got 45 FOXA TFBEs in the dual target regions overlapping genes' promoters. Finally, candidate variants were searched in these regions by retrieving human reference SNP database (downloaded from dbSNP database of NCBI, build 134). Ultimately, we identified a total of 15 genetic variants, which are co-regulated by FOXA/ERa, located at FOXA TFBEs near or within genes' promoter regions (Table S1).
All bioinformatic processes above were conducted by in-house Perl scripts. Of 15 candidate variants, there were 3 common variants with minor allele frequencies (MAFs) more than 0.05. They were rs111570813 (in TFBE of gene inter-alpha-trypsin inhibitor heavy chain 2, ITIH2), rs1998056 (in TFBE of gene serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 6, SERPINA6) and rs35335725 (in TFBE of gene keratin associated protein 2-3, KRTAP2-3). Because research about KRTAP2-3 is rare, we selected ITIH2-rs111570813 and SERPINA6-rs1998056 for population association analysis. However, the custom probe for rs111570813 is failed due to the sequence around this locus with high GC content. Thus, only SERPINA6-rs1998056 of TaqMan SNP Genotyping Assay was successfully synthesized for the following validation.

Study subjects
This study consisted of 1,081 incident patients with HCC and 2,008 cancer-free controls. All subjects were unrelated ethnic Han Chinese. HCC patients were newly diagnosed and histologically confirmed, without treatment of chemotherapy or radiotherapy prior to surgery. Among 1,081 patients included, 1,009 cases providing whole blood samples were consecutively recruited between January 2010 and March 2013 at Tongji Hospital of Huazhong University of Science and Technology (HUST), Wuhan, Central China, and 72 cases providing normal liver tissues adjacent to the tumors were enrolled from Peking Union Medical College Hospital (PUMCH), Beijing, China. These liver tissues were obtained from surgically removed specimens of individual patients. They were immediately placed in liquid nitrogen after resection, and then transferred to a 280uC freezer for storage until analysis. Controls providing whole blood samples were randomly selected from the healthy examination population in the Wuhan area during the same time period as the cases were enrolled, part of which also included in our previous case-control studies [20][21][22]. The inclusion criteria for controls contained no individual history of cancer, and frequency matched to cases on the basis of age (65 years) and gender. At recruitment, written informed consent was obtained from each study subject, and demographic data including gender, age, smoking and drinking habits were collected. This study was conducted under the approval of the institutional review boards of Tongji Medical College of HUST.

Genotyping assay
Genomic DNA (gDNA) from 5 ml anticoagulant peripheral blood was extracted with the RelaxGene Blood DNA System DP319-02 (TIANGEN, Beijing, China). For each sample of frozen normal liver tissues adjacent to the tumors, we first pulverized it manually in liquid nitrogen and then isolated DNA using the standard phenol-chloroform method.
Candidate variant rs1998056 was genotyped using TaqMan Polymerase chain reaction (PCR) Assay (Applied Biosystems, Foster city, CA). The PCR program was heating for 10 minutes at 95uC; followed by 50 cycles of 15 seconds at 92uC and 90 seconds at 60uC. After PCR amplification, the ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster city, CA) was applied to read the reacted plates and analyze the endpoint fluorescence to make allelic discrimination. Quality control was monitored by including 5% duplicates and negative controls, with the 100% concurrence rate of the duplicate sets. The genotyping call rate for rs1998056 was 99.4%.

Analysis of SERPINA6 mRNA levels
Total RNA was extracted with TRIzol reagent (Invitrogen Corp., Carlsbad, CA) and then complementary DNA (cDNA) was prepared using PrimeScript 1st Strand cDNA Synthesis Kit (Takara Biotechnology (Dalian) Co., Ltd.). qPCR was carried out to quantify the relative gene expression of SERPINA6, using ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster city, CA) based on the SYBR-green method. All quantifications were performed using ACTB as an internal reference gene. The primers used for SERPINA6 were 59-CTTCTATGTGGACGAGACAACTG-39 (forward primer) and 59-CCCACGTAGTTCATCTGCAC-39 (reverse primer); and for ACTB were 59-CATGTACGTTGCTATCCAGGC-39 (forward primer) and 59-CTCCTTAATGTCACGCACGAT-39 (reverse primer). All these sequences of primers were downloaded from PrimerBank (http://pga.mgh.harvard.edu/primerbank/index. html). In a volume of 10 ul qPCR reaction, 1 ul cDNA template was mixed with 5 ul 26 Power SYBR Green PCR Master Mix (Applied Biosystems, Foster city, CA), 200 nM of paired primers and distilled water. qPCR amplification included enzyme activation at 95uC for 10 minutes, and 40 cycles of denaturing at 95uC for 15 seconds and annealing at 60uC for 1 minute. Then a dissociation stage was used to analyze the melting curves. Each sample for a given gene was analyzed in duplicate and any sample with coefficient of variance .5% were re-analyzed.

Statistical analysis
Hardy-Weinberg Equilibrium (HWE) for genotype of rs1998056 was evaluated by using a goodness-of-fit x 2 test in the control group (P = 0.965). Either the independent-samples t test or the Pearson's x 2 test was applied to assess demographic differences between cases and controls in the distribution of age, gender, smoking and drinking status. The odds ratio (OR) and 95% confidence interval (95% CI) were estimated by logistic regression (LR) model to investigate the association between SERPINA6-rs1998056 and HCC risk. Because we aimed to examine the potential effects of candidate variant in the sexual dimorphism of HCC risk, both overall analyses and subgroup analyses by gender were conducted by univariate and multivariate LR analyses. Besides, relative gene expression of SERPINA6 was calculated by 2 2DCt method [23] where DC t = C t SERPINA6 -C t ACTB . Kruskal-Wallis one-way ANOVA test or Mann-Whitney test was performed to examine the difference between SERPINA6 expression levels and rs1998056 genotype carriers. All the statistical analyses were conducted by PASW Statistics 18.0 software (IBM Corporation, Somers, New York) and all P values were two-sided with the statistical significance criteria of P,0.05. In addition, we computed the statistic power using the Power V3.0 (http://dceg.cancer.gov/tools/design/POWER) and got a power of 0.94 to detect an OR of 1.30 in our sample size.

Variants selection and subject characteristics
By using a computational strategy shown in Figure 1, we identified a total of 15 genetic variants at FOXA1/2 TFBEs. These candidate variants are co-regulated by FOXA/ERa, and near or within genes' promoter regions. They are involved in 8 genes (LINC00862, HES1, ITIH2, BATF, DTD2, SERPINA6, LOC100507217 and KRTAP2-3), and details were presented in Table S1. There are 3 common variants with MAF.0.05, ITIH2-rs111570813, SERPINA6-rs1998056 and KRTAP2-3-rs35335725. Because of rare relevant report for KRTAP2-3 and probe designed failure for ITIH2-rs111570813, only SERPINA6-rs1998056 was further analyzed in the following study.
In the subsequent case-control study, demographic characteristics of 1,081 HCC patients and 2,008 cancer-free controls are presented in Table 1. There was no significant difference between patients and controls in the distribution of age (P = 0.149) and gender (P = 0.937). However, significantly more smokers were presented among cases than among controls (59.9% versus 48.9%, P = 1.611610 29 ). Similar result was also observed in drinking status (52.9% versus 44.7%, P = 2.583610 26 ).
Association between SERPINA6-rs1998056 and HCC risk A total of 3,089 unrelated ethnic Han Chinese people were genotyped to examine the association between SERPINA6-rs1998056 and HCC risk (1,081 cases and 2,008 controls). Results  Table 2. There was no statistical relation of this variant with HCC risk in total participants in genotype, recessive, dominant and additive models by using LR analyses. Similar results were also observed in males. But in females, both univariate and multivariate analyses strongly suggested a significant association between SERPINA6-rs1998056 and the susceptibility to HCC in all genetic models. After adjusting for age, smoking and drinking status, women with GG genotype suffered 2-fold HCC risk than those with CC genotype (OR = 2.03, 95% CI = 1.

SERPINA6 mRNA levels in liver tissues from different genotype carriers
To examine the effect of rs1998056 on SERPINA6 expression in the target tissues, the levels of SERPINA6 mRNA in normal liver tissues adjacent to the tumors from 72 individuals were quantitated by qPCR (Figure 2). We observed a borderline significance of SERPINA6 expression among the CC, CG and GG genotype carriers (Kruskal-Wallis one-way ANOVA test: P = 0.060). Specially, the GG genotype carriers had significantly higher SERPINA6 mRNA levels than the CC genotype carriers (Mann-Whitney test: P = 0.032). We also compared the SER-PINA6 mRNA levels between men and women, but did not find any difference (P = 0.937).

Discussion
In this study, we used an integrative genomic strategy to demonstrated that 1) a simple but effective approach can be applied to identify new HCC-related genes and variants by integrating ChIP-seq data, genome mapping and dbSNP database; 2) SERPINA6-rs1998056 is significantly associated with female HCC risk suggested by the subsequent case-control study; and 3) results of qPCR using target tissues further supported the functional effect of this variant.
Nowadays, more and more studies highlight that genetic variants at TFBS have the potential to affect the protein (transcriptional factors)-DNA (cis-regulatory elements) interaction and then modify the expression of target genes, thus involving in the development of complex diseases including cancers [18,[24][25][26][27]. On the basis of previous studies and findings from Li Z et al. [18], we successfully applied a bioinformatics analysis to identify 15 genetic variants at FOXA1/2 binding elements. These candidate variants are near or within genes' promoter regions and co-regulated by FOXA/ERa, thus probably playing a role in the development of HCC, especially in women. Our subsequent case-control study further found that SERPINA6-rs1998056 was associated with female HCC risk. This identified gene SERPINA6 is known to be regulated by ERa [28][29][30], which suggested that our approach was reliable.
SERPINA6-rs1998056 is a C/G variation mapped to chr14: 94789495 (GRCh37), which is located in the first intron of SERPINA6 gene. Besides evidences in our study, many authoritative databases online have also provided strong evidences to support the potentially functional role of this variant. F-SNP (http://compbio.cs.queensu.ca/F-SNP/) and RegulomeDB (http://regulome.stanford.edu/index) all show that SERPINA6-rs1998056 likely to affect the transcriptional factors binding and thus regulating the expression of their target genes. Specifically in the HepG2 cell line, this variant area can be bound by other important transcriptional factors including EP300, HNF4A and CEBPB; epigenetic marks of H3k4me1, H3K4me3 and H3K9ac that highlight active gene promoters are also found in the region; this variant area also overlaps with DNase I hypersensitive site (DHS), which is a specific chromatin structure often indicating cisregulatory elements. Similar results are also obtained from HaploReg v2 (http://www.broadinstitute.org/mammals/ haploreg/haploreg.php) and UCSC Genome Browser (http:// genome.ucsc.edu/cgi-bin/hgGateway). All the evidences above strongly support that SERPINA6-rs1998056 is located at the active cis-regulatory element and probably modifies the transcriptional level of SERPINA6 in liver. Moreover, LiverAtlas database (http://liveratlas.hupo.org.cn/) shows that SERPINA6 is a HCC significant gene and Cancer Gene Expression Database (CGED) (http://lifesciencedb.jp/cged/) suggests that this gene significantly differently expresses in HCC comparing with non-tumor liver tissues, which suggest a potential association of SERPINA6 with hepatocarcinogenesis. SERPINA6 encodes the major transport protein for glucorticoids and progestins in the blood, not only regulating endopeptidase and proteolysis, but also playing a role in hormone-related pathways [28][29][30][31][32]. Current researches have suggested a potential role of SERPINA6 in liver diseases. Decreased circulating SERPINA6 concentrations were observed in patients with liver cirrhosis [33,34]. Changed expression of SERPINA6 could provide information about liver impairment [35]. Some studies have indicated an association between SERPINA6 and type 2 diabetes too [36,37]. All these diseases above are important risk factors during hepatocarcinogenesis. Based on previous studies and our results, we speculated that SERPINA6-rs1998056 was likely to influence the co-regulation of FOXA1/2 and ERa, modulate the expression of SERPINA6, then play a role in damage of hormone homeostasis and finally increase the risk of female HCC.
Several limitations of our study should be acknowledged. First, other candidate variants were not for population association examination due to several practical reasons including low MAFs or probe designed failure. Second, some other risk factors (family history, hepatitis virus infection, obesity, etc.) were not fully collected during investigation, thereby resulting in insufficient adjustment for ORs of HCC risk. Finally, data about hepatitis virus B (HBV) infection were missing in more than half of participants, so we could not analyze the HBV status in this study. Independent replication studies and biochemical assays are warranted to verify our results.
In conclusion, our study suggested that SERPINA6-rs1998056 regulated by FOXA/ERa might be associated with female HCC risk by using an integrative genomic strategy. Our findings not only presented an effective way to excavate predisposing variants and genes in complex diseases, but also advanced our understanding of the genetic etiology of HCC.