Impact of COL6A4P2 gene polymorphisms on the risk of lung cancer: A case-control study

Lung cancer (LC) is a malignant tumor that poses the greatest threat to human health and life. Most studies suggested that the occurrence of LC is associated with environmental and genetic factors. We aimed to explore the association between COL6A4P2 single nucleotide polymorphisms (SNPs) and CHD risk in the Chinese Southern Han population. Based on the ‘case-control’ experimental design (510 cases and 495 controls), we conducted an association study between five candidate COL6A4P2 SNPs and the corresponding LC risk. Odds ratio (OR) and 95% confidence intervals (CIs) were calculated by logistic regression to analyze the LC susceptibility under different genetic models. The results showed that COL6A4P2 rs34445363 was significantly associated with LC risk under alleles model (OR = 1.26, 95%CI: 1.01–1.58, p = 0.038). In addition, rs34445363 was also significantly associated with LC risk under the log-additive model (OR = 1.26, 95%CI: 1.01–1.58, p = 0.041). The results of subgroup analysis showed that rs34445363 (OR = 1.42, 95%CI: 1.03–1.95, p = 0.033) and rs61733464 (OR = 0.72, 95%CI: 0.52–0.99, p = 0.048) were both significantly associated with LC risk in the log-additive model among participants who were ≤ 61 years old. We also found that the variation of rs34445363 (GA vs. GG, OR = 1.73, 95%CI: 1.04–2.86, p = 0.034) and rs77941834 (TA vs. TT, OR = 1.88, 95%CI: 1.06–3.34, p = 0.032) were associated with LC risk in the codominant model among female participants. Our study is the first to find that COL6A4P2 gene polymorphism is associated with LC risk in the Chinese Han population. Our study provides a basic reference for individualized LC prevention.


Introduction
Lung cancer (LC) is a malignant tumor with the fastest growth in morbidity and mortality and the greatest threat to human health and life [1]. According to the Global Cancer Observatory database (http://gco.iarc.fr/) [2], there are 2,093,876 new cases of LC worldwide, accounting for 11.6% of all cancers; the number of people who died of LC in 2018 is 1,761,007, accounting for 17.9% of all cancer deaths in 2018. Among them, the incidence and mortality of LC in a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 women were 13.1% and 6.9%, respectively. LC has become the most malignant tumor with the highest incidence and mortality [3][4][5]. In China, LC also has a high incidence and mortality, and its morbidity and mortality in men are more than twice that of women [6]. Most studies have suggested that the occurrence of LC is associated with environmental (smoke, occupational exposure, and air pollution) and genetic factors [7,8]. In particular, genetic factors play an essential role in the occurrence of LC. Li et al. [9] revealed that LC susceptibility in the Chinese Han population was associated with HOTAIR gene mutations. Dimitrakopoulos et al. [10] believed that the NF-kB2 gene mutation is significantly associated with LC risk. However, the association between COL6A4P2 gene polymorphisms and LC susceptibility has not been reported.
COL6A4P2 (collagen type VI alpha 4 pseudogene 2), also named COL6A4, is located on Chr.3q22 in humans. COL6A4 expresses type VI collagen (COL6), an extracellular matrix protein that plays a vital role in maintaining lung tissue integrity. Chiu et al. [11] showed by quantitative secretion cleavage that COL6 is a protein involved in tumor metastasis. Voiles et al. [12] demonstrated that the expression of the COL6 protein in LC is upregulated. Thus, we suspect that COL6A4 may be associated with LC.
It has been reported that COL6A4 is an unprocessed pseudogene due to the presence of multiple stop codons in the gene sequence [13]. Many studies have shown that pseudogenes play an essential role in cancer development. Cheng et al. [14] have found that pseudogenes affect the occurrence and development of cancer by forming lncRNA-pseudogene-mRNA competitive triples. Lynn et al. [15] have confirmed that polymorphisms in the MYLKP1 pseudogene is associated with an increased risk of colon cancer. Wei et al. [16] have found that the pseudogene DUXAP10 promotes the invasiveness of LCs. Therefore, we speculated that the COL6A4P2 gene minght play a role in cancer development.
In this study, we first explored the association between the COL6A4P2 gene and LC risk, revealing the association between COL6A4P2 gene polymorphism and LC susceptibility in the Chinese Han population.

Study participants
Using a case-control design, 510 LC patients (mean age: 60.78 ± 9.96 years) and 495 controls (mean age: 61.94 ± 7.72 years) were enrolled in the study. All patients were recruited from Shaanxi Provincial Cancer Hospital. Patient inclusion criteria were as follow: 1) newly diagnosed LC, 2) histopathological LC diagnosed by an experienced pathologist, 3) no previous radiation therapy or chemical therapy, and 4) no history of cancer or metastatic carcinoma. Patients with asthma, bronchitis, pneumonia, lung abscess, tuberculosis, other lung diseases, autoimmune diseases, trauma or other tumors were excluded from the study. After that, we investigated and collected information regarding clinical indicators of LC patients, including sex, age, histological classification, tumor stage, and the status of lymph node metastasis.
The controls were healthy volunteers recruited from the Shaanxi Provincial Cancer Hospital during the same period. No medical or family history of cancer or any pulmonary disease was the inclusion criteria for the control group. At the time of recruitment, trained personal interviewed using a structured questionnaire to obtain information regarding their demographic characteristics.

Data collection
This study was approved by the Shaanxi Provincial Cancer Hospital ethics committee and conformed to the ethical principles for medical research involving humans of the World Medical Association Declaration of Helsinki. All participants signed informed consent forms before participating in the study. Subsequently, a sample of approximately 5 mL of venous blood was obtained from each participant and collected into tubes containing ethylenediaminetetraacetic acid for anticoagulation. Genomic DNA was extracted from peripheral blood samples using a whole-blood genomic DNA extraction kit (GOLDMAG, Xi´an, China) according to the manufacturer's instructions. The purity and concentration of the DNA samples were evaluated using a NanoDrop 2000C system (Thermo Scientific, Waltham, MA, USA). Isolated DNA was stored at −80˚C until analysis.

SNP genotyping
Five candidate SNPs in the COL6A4P2 gene were selected with a minor allele frequency (MAF) > 0.05 from the global population in the 1,000 Genome Projects (http://www. internationalgenome.org/). We then used HaploReg v4.1 (https://pubs.broadinstitute.org/ mammals/haploreg/haploreg.php) to predict the possible functions of the SNPs. The primers for amplification and single-base extension were designed using the Assay Design Suite, V2.0 (https://agenacx.com/online-tools/). Genotyping of the five SNPs was carried out on the Mas-sARRAY iPLEX (Agena Bioscience, San Diego, CA, USA) platform using matrix-assisted laser desorption ionization-time of flight mass spectrometry [17]. Genotyping results were generated using Agena Bioscience TYPER software, version 4.0. Genotyping was performed by laboratory personnel in a double-blinded manner.

Analysis of COL6A4P2 and SNPs expression
Data regarding the expression of COL6A4P2 in LC were obtained from the UALCAN online database (http://ualcan.path.uab.edu/analysis.html), a web server that provides customizable functions. Tumors and normal samples in the UALCAN database were derived from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects. The effect of COL6A4P2 gene expression on LC prognosis was predicted using the OncoLnc database (http://www.oncolnc.org/). We also predicted the expression of SNPs in the COL6A4P2 gene in normal lung tissues using the GTEx database (https://gtexportal.org/home/).

Statistical analyses
An independent sample t-test was used to assess differences in the population and clinical characteristics of the study participants. Fisher's exact tests for HWE were performed by comparing the observed and expected genotype frequencies to calculate the genotype frequencies among the controls. Pearson's χ 2 test was used to compare the allelic and genotype frequencies of each SNP between LC patients and controls. Multiple genetic model analyses (codominant, dominant, recessive, and log-additive) were performed using PLINK software (http://zzz.bwh. harvard.edu/plink/ld.shtml) to assess the association between SNPs and LC risk. Furthermore, we calculated stratification factors using sex and age to adjust for possible confounders. Finally, we used Haploview software (version4.2) to construct haplotypes and to estimate the pairwise linkage disequilibrium using the SHEsis software platform (http://analysis.bio-x.cn/ myAnalysis.php) was used to estimate the association between haplotype and LC risk. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using logistic regression analyses adjusted for sex and age [18], with the wild-type allele used as a reference. Statistical analyses were performed using SPSS software (version 21.0, IBM Corporation, Armonk, NY, USA). All p-values of statistical tests were two-sided, and p < 0.05, which considered indicative of statistical significance. We also conducted a false-positive report probability (FPRP) analysis to detect whether the significant findings were just chance or noteworthy observations [19].
Basic information and allele frequencies of COL6A4P2 gene polymorphisms are presented in Table 2. The genotype distribution of all SNPs in the control subjects met the HWE (p > 0.05). HaploReg function annotation results revealed that SNPs associated with LC risk were successfully predicted to have biological functions. The association between COL6A4P2 polymorphisms and LC risk under the allele model is shown in Table 2, and the results showed that rs34445363 was associated with an increased LC risk (OR = 1.26, 95%CI: 1.01-1.58,

Association between the COL6A4P2 gene and the risk of LC
Genetic models (codominant, dominant, recessive, and log-additive) and genotype frequencies were used to identify any associations between the SNPs and the risk of LC. The results showed that rs34445363 in the COL6A4P2 gene significantly increased the risk of LC in the log-additive model (adjusted for age and sex, OR = 1.26, 95%CI: 1.01-1.58, p = 0.041, Table 3), and no significant difference was found for the other SNPs between cases and controls (all p > 0.05).

Association between COL6A4P2 polymorphism and clinicopathological features
To evaluate the association of COL6A4P2 SNPs with various clinicopathological features, we segregated patients according to the clinical stage (I-II vs. III-IV) and LNM status (positive vs. negative). There was no significant association between LNM status and COL6A4P2 polymorphism variation (S1 Table)

Stratification analysis of age and gender
Multiple inheritance model analysis showed that age and sex significantly affected the association between COL6A4P2 SNPs and LC risk. We found that rs34445363 was associated with a higher incidence of LC in people aged � 61 years with the AA genotype in the  (Table 5).
In addition, we found that the sex significantly affected the association between SNPs of the COL6A4P2 gene and LC risk (

FPRP analysis
The results of FPRP analysis showed that (S2 Table): the association between COL6A4P2 rs34445363 and LC in people aged � 61 (p = 0.049) was not noteworthy at the prior probability level of 0.25 and FPRP threshold of 0.2 (FPRP = 0.338). The FPRP of the remaining significant results were all less than 0.2, which means that these positive results were noteworthy.

Association of COL6A4P2 haplotypes with the risk of LC
SNPs in the current study were in linkage disequilibrium for the study population (Fig 1). Unfortunately, there was no statistically significant difference among the COL6A4P2 haplotype frequencies in the cases and controls (S3 Table).

Discussion
In this study we analyzed the association of COL6A4P2 gene polymorphisms with susceptibility to LC. We identified that rs34445363 in COL6A4P2 was associated with an increased risk of LC. Our results also suggested that rs34445363 site mutations increase the risk of LUAD, while the mutation of rs61733464 significantly decreased the LUAD risk. These results suggest an association between genetic polymorphisms of COL6A4P2 and susceptibility to LC. Numerous studies have shown that collagen levels play an essential role in the development of LC [20,21]. Naveen et al. [22] identified collagen VI as a potential biomarker for the early diagnosis of LC by proteomic analysis, suggesting that LC is associated with collagen-encoding genes. The COL6A4P2 gene is a pseudogene formed by the chromosomal break of the collagen-encoding gene COL6A4 [13,23]; therefore, we speculate that the COL6A4P2 gene may be associated with LC. Our results suggest that the rs34445363 mutation in the COL6A4P2 gene significantly increases the risk of LC, validating our conjecture, and is consistent with previous studies.
Our results also found that the association between COL6A4P2 gene polymorphism and LC risk was influenced by gender and age. A retrospective analysis by Oh et al. [24] assessed the crucial effects of sex and age in the development of LC. Aareleid et al. [25] revealed that LC has different incidence rates in different genders and ages. These studies are consistent with our results and enhance the credibility of our findings.
Furthermore, we predicted the differential expression of COL6A4P2 in normal lung tissues and LC tissues using a database. Voiles et al. [12] found that collagen VI protein levels increased in tumor lung tissue and speculated that the expression of the COL6A4P2 gene in tumor lung tissue is variable. This is consistent with our predictions. Fagerberg et al. [26] found that the COL6A4P2 gene is specifically expressed in human lung tissue by genome-wide

Conclusion
In conclusion, the present study is the first to investigate the association between COL6A4P2 and LC. Our findings indicated that COL6A4P2 gene polymorphism is associated with LC risk in the Chinese Han population. However, it is necessary to conduct further studies in other races and larger sample sizes to confirm our results. Our study provides a basic reference for individualized LC prevention.
Supporting information S1 Table. Association between COL6A4P2 polymorphism and lymph node metastasis status in patients with lung cancer. (DOCX)