A 3′ UTR SNP in COL18A1 Is Associated with Susceptibility to HBV Related Hepatocellular Carcinoma in Chinese: Three Independent Case-Control Studies

Background Accumulated evidences indicate that single nucleotide polymorphisms (SNP) in angiogenesis and tumorigenesis related genes are associated with risk of Hepatocellular carcinoma (HCC). COL18A1 encodes the precursor of endostatin, which is a broad-spectrum angiogenesis inhibitor, and we speculate that SNPs in COL18A1 may be associated with susceptibility to HCC. Methods and Findings We carried out a 2-stage association study in 3 independent case-control groups in a total of 1067 chronic hepatitis B (CHB) patients and 808 hepatitis B virus (HBV) related HCC patients in Han Chinese. Four SNPs which can represent all potential functional SNPs with MAF>0.1 recorded in HapMap database were genotyped using TaqMan methods. Levels of total COL18A1 mRNA were also examined using quantitative real-time RT-PCR. We found that rs7499 located in 3′-UTR to be strongly associated with HBV related HCC (Pcombined = 0.0000005, OR = 0.72, 95%CI = 0.63–0.82). COL18A1 mRNA expression was significantly decreased as the disease progressed (P = 0.000026). Conclusion These findings indicate that COL18A1 rs7499 may contribute to the risk of HCC in Han Chinese.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignancy and ranks fifth in men and eighth in women among causes of cancer mortality worldwide. It is estimated that about 564,000 new cases of HCC are reported throughout the world each year [1]. Eastern Asia is the geographic area at highest risk of HCC [1]. The cause of HCC is a complex interplay between multiple genetic and environmental factors [2]. Hepatitis B virus (HBV) and hepatitis C virus (HCV) are the main viral factors of HCC, while in China, HBV related HCC is the most frequent. On the other hand, accumulated evidences in molecular genetics indicate that single nucleotide polymorphisms (SNP) in immune response, angiogenesis and tumorigenesis related genes are associated with susceptibility to HCC [3,4,5,6]. Recent progress in genome-wide association study (GWAS) also have identified new susceptibility loci for HCC [7,8].
Collagen, type XVIII, alpha 1 (COL18A1) gene locates at 21q22. 3. It has 42 exons, and encodes a protein of 1336aa. This protein is the precursor of endostatin, which is a 20-kDa protein derived from carboxy-terminal proteolytic fragment of collagen XVIII [9]. Endostatin is a broad-spectrum angiogenesis inhibitor and interferes with growth factors such as VEGF [10], and has the potency to inhibit neovascularization and tumor growth [11]. Angiogenesis plays an important role in tumorigenesis. Tumours secrete a number of angiogenic growth factors such as VEGF. Furthermore, the expression of endogenous inhibitors, for instance endostatin and angiostatin, are downregulated [12]. Studies have reported that endostatin expression was significantly stronger in adjacent nontumor tissues than that in tumors in HCC specimen [13], and effectively inhibit the growth of HCC [14].
According to the above evidence, we speculate that SNPs in COL18A1 may be associated with susceptibility to HCC. We used a candidate gene strategy and carried out a 2-stage association study to confirm this hypothesis.

Subjects
The subjects enrolled in the present study included 3 independent case-control groups in a total of 1067 chronic hepatitis B (CHB) patients and 808 HBV related HCC patients. Controls were CHB patients whose serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) were continuously .40 IU/L; they were HBsAg seropositive and HBeAg seropositive for 6 months; their serum HBV DNA .2,000 copies/mL and confirmed by liver ultrasonography.
Cases were pathologically HCC patients, pathologically confirmed, and proved not to have other cancers. They were also confirmed by liver ultrasonography and/or computed tomography.
Subjects were considered smokers if they smoked up to 6 months before the date of cancer diagnosis for HCC cases or the date of interview for CHB controls. An alcohol drinker was defined as someone who consumed alcohol at least once per week for at least 6 months.
The subjects were excluded if: (1) there was evidence of past or current infection with other hepatitis viruses or hepatitis not caused by HBV; (2) they were not of Han ethnicity. The main features of the subjects included are summarized in Table 1 and Table S1. The study was carried out in accordance with the guidelines of the Helsinki Declaration after obtaining written informed consent from all the subjects and was approved by the ethics committee of the Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences.

SNP selection and genotyping
Genomic DNA was extracted from peripheral blood by using a salting-out protocol [15]. Using the HapMap database (HapMap Data Rel 24/phase II Nov 08, on NCBI B36 assembly, dbSNP b126), potential functional SNPs (SNPs in promoter region and mRNA sequence) with minor allele frequency (MAF) of greater than 0.10 for the Han Chinese Beijing population were selected from the entire gene region from approximately 4000 bp upstream of the transcription start site to 2000 bp downstream of the 39untranslated region (39UTR). Five SNPs were found, namely rs2183589 (promoter), rs2230687, rs2230688, rs11702425 (synonymous coding) and rs7499 (39UTR). Among these SNPs, rs2230687 and rs2230688 were in complete linkage disequilibrium (LD) according to HapMap data, so we chose rs2230688 as a representative. The 4 selected SNPs were genotyped by TaqMan method, with probes synthesized by Sangon BioTech Co Ltd (Shanghai, China). Primers and TaqMan probes used are listed in Table S2. All the samples were successfully genotyped. For genotyping quality control, 5% samples were randomly selected and directly sequenced, and we obtained 100% identical results.

Quantitative Analysis of Gene Expression
Total RNA was extracted from the peripheral blood by RNAprep pure Blood Kit (TianGen, Beijing, China). cDNA was synthesize in a 20 uL reaction volume containing Oligo (dT)18 primers (MBI Fermentas, Lithuania), and RevertAid TM H minus M-MuLV reverse transcriptase (MBI Fermentas, Lithuania). Expression of COL18A1 mRNA was measured by TaqMan relative quantitative analysis using the Bio-Rad iQ5 Real-Time PCR Detection system (Bio-Rad, Hercules, CA). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as an internal control gene. The primers and probe used for amplification of COL18A1 and GAPDH cDNA samples are listed in Table 2. The amplifications of COL18A1 and GAPDH cDNA of each sample were performed in the same 96-well PCR plate. Each experiment was performed in triplicate assay. The comparative Ct (2 2DDCt ) method was employed to quantify COL18A1 expression as described previously [16].

Statistical analysis
By using the x2 test, we tested whether the genotype distributions for the studied SNP were in the Hardy-Weinberg equilibrium (HWE). We used 262 or 263 contingency tables for comparing allele and genotype frequencies between groups. Tests for differences of quantitative traits and COL18A1 mRNA expression between different groups were performed using the Mann-Whitney U test or Kruskal-Wallis Test for traits with nonnormal distributions, or ANOVA for normally distributed traits. P,0.05 was the criterion for statistical significance. All statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS), version 12.0. We obtained estimates of LD values (r2, D9) and the haplotype estimation using the SHEsis online software [17].

Results
We first conducted genotyping experiments for the 4 COL18A1 polymorphisms in the 302_Beijing group of samples. Genotype distributions of the studied SNPs were in HWE in both cases and controls. The genotype distributions and allelic frequencies of COL18A1 polymorphisms in CHB and HCC patients were represented in Table 2. The frequency of C allele of rs7499 was 45.3% in HCC patients vs. 55.4% in CHB patients (P = 0.006, OR = 0.67, 95%CI = 0.50-0.89). The Cochran-Armitage trend test (assuming an additive model for C allele) revealed an allele dose-dependent association of rs7499 with the HCC (P = 0.005), with increased OR of 2.08 and 2.38 for CT and TT genotypes, respectively. We then used binary logistic regression to adjust for confounding factors as age, gender, smoking and drinking under additive model, and the result showed that rs7499 was still independently associated with HCC (P = 0.009, OR = 0.66, 95%CI = 0.48-0.90). The 3 other SNPs, however, were not associated with HCC under any model. We analyzed the degree of LD for these 4 SNPs, and found there was no apparent LD (D9#0.291, r 2 #0.017). Table 3 shows 5 common haplotypes constructed by these 4 SNPs. The C-T-T-T haplotype was associated with HCC most significantly (P = 0.0002). Notably, the C-T-T-C haplotype, which was different with C-T-T-T haplotype only at the rs7499 locus, was also associated with HCC (P = 0.01), but with opposite tendency. These results were in accordance with the single locus analysis, suggesting that rs7499 was a principal genetic factor in COL18A1.
We next replicated genotyping of rs7499 in 2 independent casecontrol samples from Beijing and Guangxi respectively. The result was presented in Table 4. rs7499 was associated with HCC significantly both with the same tendency as the first group of samples. Except that in group 2, when using binary logistic regression to adjust age, gender, smoking and drinking, the association was disappeared (P = 0.29). We combined the 3 groups together, in a total of 1067 CHB patients and 808 HCC patients, the frequency of C allele of rs7499 was 42.5% in HCC patients vs. 50.8% in CHB patients (P = 0.0000005, OR = 0.72, 95%CI = 0.63-0.82). The Cochran-Armitage trend test and binary logistic regression also indicated rs7499 to be strongly associated with HCC (P = 0.0000004, P = 0.00001, respectively).
We examined the quantitative traits such as ALT, AST, total bilirubin (TBil), direct bilirubin (DBil) and HBV DNA levels with different genotypes among CHB and HCC patients respectively to evaluate the association between genotypes of SNP rs7499 and phenotypes. The results showed that the quantitative traits did not  significantly differ with the three genotypes of rs7499 (data not shown).
We then examined levels of total COL18A1 mRNA using quantitative real-time RT-PCR. Expression of COL18A1 mRNA levels was measured in RNA from WBCs in 20 HCC patients, 32 CHB patients and 43 HBsAg-negative healthy individuals. Expression levels of COL18A1 mRNA for the three different genotypes of rs7499 were also compared in CHB patients and HBsAg-negative healthy individuals. Final abundance figures were adjusted to yield an arbitrary value of 1 for HCC patients ( Figure 1A). The result showed that the HBsAg-negative healthy individuals had 2.33-fold higher COL18A1 mRNA expression than HCC patients and CHB patients had 1.79-fold higher COL18A1 mRNA expression than HCC patients (P = 0.000026). The COL18A1 mRNA level between three different genotypes of rs7499 in healthy individuals had no significant differences (CC: 1.8160.75; CT: 2.3961.23; TT: 2.4661.22; P = 0.60). But in CHB patients, TT carriers had significant lower COL18A1 mRNA level than non-TT carriers (1.1860.67 vs. 2.0361.08, P = 0.035), as shown in Figure 1B.

Discussion
As an anti-angiogenesis gene, COL18A1 plays an important role in tumorigenesis. Studies have reported that SNPs in COL18A1 are associated with numerous cancers, such as breast cancer [18,19], prostate carcinoma [20,21], colorectal adenocarcinoma [22] and lung cancer [23]. A non synonymous SNP, D104N, located in the 42th exon of COL18A1, is widely studied [18,19,20,21,22,23]. However, the MAF of this SNP is low in HCB population, about 0.047 according to HapMap data, as we selected SNPs with MAF.0.1, the D104N polymorphism was not included in the present study. Further studies are needed to elucidate the role of D104N in HCC.
The distribution of gene polymorphism differed greatly in different ethnicity and region. In the present study, we collected samples from south and north of China, which may represent the population differences. The genotype distribution of rs7499 between Beijing and Guangxi population is different (Table 2  and Table 4); however, the tendency of differences between CHB and HCC groups is similar. The association between rs7499 and HCC has been replicated in both Beijing and Guangxi populations, which indicates that this polymorphism may have a genetic influence in the development of HCC.
rs7499 locates in the 39-UTR region. After retrieving the NCBI dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_gf.cgi), we find that rs7499 is in complete LD with another two 39-UTR SNPs, i.e. rs8199 and rs7867. SNPs in 39-UTR region may disrupt or create a microRNA binding site so as to repress translation or destabilize mRNA. The ''C'' to ''T'' change of rs7499 may disrupt a binding site for a microRNA hsa-mir-328. The analysis was performed with the program miRBase, available at http://www.mirbase.org/index.shtml [24]. These data indicate that rs7499 may be functional itself, or in LD with other functional SNPs. Several recent works have also genotyped rs7499 and studied its association with myopia [25] and ovarian cancer [26]. Interestingly, in these studies, rs7499 were not associated with diseases, but the distribution patterns of different genotypes of rs7499 were similar in different populations.
We examined the COL18A1 mRNA expression and find that the HBsAg-negative healthy individuals and CHB patients had higher mRNA expression than HCC patients. The role of Endostatin/ collagen XVIII expression during HCC was controversial [27,28,29]. Evidences suggested that free, soluble endostatin inhibited angiogenesis, whereas immobilized form supported the survival and migration of endothelial cells; so in some cases, endostatin/C18 may promote, instead of abolish, angiogenic processes [30,31]. The present results suggest that since the expression of angiogenesis factors is elevated during HCC tumorigenesis, it is reasonable that expression of angiogenesis inhibitors such as endostatin/C18 is decreased in HCC patients. We also compare the COL18A1 mRNA expression between three different genotypes of rs7499 in HBsAg-negative healthy individuals and CHB patients. The results in CHB patients showed that TT carriers had significant lower COL18A1 mRNA level than non-TT carriers, but in HBsAg-negative healthy individuals, CO-L18A1 mRNA level between 3 genotypes did not differed significantly. There is possibility that rs7499 genotypes were not directly associated with COL18A1 mRNA expression. For the gene expression regulation is complicated. The role of endostatin expression during HCC was controversial, in some cases, endostatin may promote, instead of abolish, angiogenic processes. On the other hand, the samples for studying gene expression are small, so there is possibility of false negative. The result should be verified in larger sample set. Several limitations of the present study need to be addressed. Some clinical features between CHB and HCC patients did not match well. Especially in the Youan_Beijing group of samples, when adjusting age, gender, smoking and drinking, the association between rs7499 and HCC is disappeared. In future studies, clinical features should be well matched. Second, the selection of SNPs in our work is based on database searching. Although the 4 SNPs genotyped in our work can represent all potential functional SNPs with MAF.0.1 recorded in HapMap database, it should be noted that some SNPs not recorded in the database may be omitted; sequencing the whole functional region of COL18A1 is needed in future studies.
In conclusion, we carried out a 2-stage association study and found rs7499 located in 39-UTR region of COL18A1 gene to be strongly associated with HBV related HCC. Further study in other ethnicities, and the present finding to be confirmed with larger sample set in Han Chinese will be needed to clarify the role of this polymorphism.

Acknowledgments
We are thankful to all the subjects who participated in the present study.