Statistical meta-analysis to investigate the association between the Interleukin-6 (IL-6) gene polymorphisms and cancer risk

A good number of genome-wide association studies (GWAS), including meta-analyses, reported that single nucleotide polymorphisms (SNPs) of the IL-6 gene are significantly associated with various types of cancer risks, though some other studies reported insignificant association with cancers, in the literature. These contradictory results may be due to variations in sample sizes and/or deficiency of statistical modeling. Therefore, an attempt is made to provide a more comprehensive understanding of the association between the IL-6 gene SNPs (rs1800795, rs1800796, rs1800797) and different cancer risks, giving the weight on a large sample size, including different cancer types and appropriate statistical modeling with the meta-dataset. In order to attain a more reliable consensus decision about the association between the IL-6 gene polymorphisms and different cancer risks, in this study, we performed a multi-case statistical meta-analysis based on the collected information of 118 GWAS studies comprising of 50053 cases and 65204 control samples. Results from this Meta-analysis indicated a significant association (p-value < 0.05) of the IL-6 gene rs1800796 polymorphism with an overall increased cancer risk. The subgroup analysis data based on cancer types exhibited significant association (p-value < 0.05) of the rs1800795 polymorphism with an overall increased risk of cervical, liver and prostate cancers; the rs1800796 polymorphism with lung, prostate and stomach cancers; and the rs1800797 polymorphism with cervical cancer. The subgroup analysis of ethnicity data showed a significant association (p-value < 0.05) of an overall cancer risk with the rs1800795 polymorphism for the African and Asian populations, the rs1800796 polymorphism for the Asian only and the rs1800797 polymorphism in the African population. Comparative discussion showed that our multi-case meta-analyses received more support than any previously reported individual meta-analysis about the association between the IL-6 gene polymorphisms and cancer risks. Results from this study, more confidently showed that the IL-6 gene SNPs (rs1800795, rs1800796 and rs1800797) in humans are associated with increased cancer risks. Therefore, these three polymorphisms of the IL-6 gene have the potential to be evaluated as a population based rapid, low-cost PCR prognostic biomarkers for different types of cancers diagnosis and research.


Introduction
Cancer is a leading cause of death worldwide. According to the World Health Organization (WHO), 9.6 million deaths occurred in 2018 from 18.1 million cancer patients all over the globe. It has been estimated that the incidence of cancer occurrences might be increased by 50% to 15 million new cases by the year 2020 [1]. The GLOBOCAN database published the extent of mortality and outbreak in 2018 from 36 types of cancer in 185 countries [2]. According to the recent literature reviews, it is very much evident now that cancer is a multi-factorial progressive disorder that developed under the influence of genes and their interactions [2][3][4].
To overcome the ambiguity of GWAS findings, some Author's performed meta-analysis based on only one of three important SNPs (rs1800795, rs1800796 and rs1800797) of the IL-6 gene or only one type of cancer to take more reliable and valid conclusion [123][124][125][126][127][128][129]. It should be mentioned here that a meta-analysis is conducted by the complete coverage of all relevant studies, solving the heterogeneity problem, and exploring the robustness of main findings using sensitivity analysis. Those meta-analysis reported that (i) the rs1800795 polymorphism of the IL-6 gene shows significant association with cervical [123] and colorectal [124] cancers, but insignificant association with stomach cancer [128,129], (ii) thers1800796 polymorphism shows contradictory association with stomach cancer [111] and insignificant association with lung cancer [126,127] and (iii) thers1800797 polymorphism shows insignificant association with colorectal cancer [124], stomach cancer [129] and all type of risks [128]. Thus those meta-analysis reports on the IL-6 gene were not consistent in their common issues. Zhou et al. [131] performed multi-case meta-analysis considering all of three important SNPs of the IL-6 gene as mentioned previously, three different ethnicities (Asian, African, Caucasian), nine types of cancers based on 49,408 cancers and 61,790 control cases. They reported that the IL-6 gene is significantly associated with the overall cancer risk. Particularly, they reported

Statistical modeling for meta-analysis
Meta-analysis is a collection of statistical methods to compile the results of similar independent studies. It is used to take the overall decision across a number of similar studies. Let us now introduce the statistical methods that are used in this paper for taking the overall decision about the relationship between the IL-6 gene polymorphisms and cancer risk. At first we have checked the quality of existing studies by testing the Hardy-Weinberg equilibrium (HWE). The HWE test is performed using the Chi-square statistic with the null hypothesis that the genotypic ratio is consistent for the control population of all studies. The chi-square statistic for this test is given by: which follows chi-square distribution with 1 degree freedom. Here O i and E i represents observe and expected frequency of the genotype, respectively. If p and q are the probabilities of C and G allele, respectively and O i = obs(i) is observe frequency of ith genotype among the 3 genotypes CC, CG and GG. Then p is calculated as: The expected frequency of ith genotype is denoted by E i = E(i) defined as E(CC) = p 2 n, E(CG) = 2pqn, E(GG) = q 2 n, where n is the total number of observation. The heterogeneity of different studies has been examined by using Cochran's Q statistic and its extended Higgin's & Thompson I 2 statistic [131,132]. The Cochran's Q statistic is defined as: which follows the chi-square distribution with K-1 degrees of freedom. Hereŷ k ¼ ln OR k ð Þ for the kth study, and w k ¼ 1 is the weight of kth study. The variance of the kth study can be calculated as: where m 1k and m 2k indicates the number of exposures and m 3k and m 4k indicates non-exposures, in the case-control groups of kth study, respectively (that is, for the genetic model C vs. G, the allele C is exposer and G is non-exposer). The Higgin's& Thompson I 2 statistic is defined as: The values of I 2 greater than 25%, 50% and 75% indicates the low, moderate, and high heterogeneity among the individual studies, respectively. The pooled odds ratio (OR) has been applied for checking the significant association between the IL-6 gene polymorphisms and cancer risk under different genetic models like as dominant models [CC + CG vs. GG or AA + AG vs. GG], homozygote models [CC vs. GG or AA vs. GG], over-dominant models [CG vs. CC + GG or AG vs. AA + GG], recessive models [CC vs. CG + GG or AA vs. AG + GG], and allelic contrast models [C vs. G or A vs. G]. To calculate pooled OR for each genetic combination, we have used the random effect model if the Q-test suggests the highly significant heterogeneity (p-value < 0.10) among different studies; otherwise, fixed effect model are used. We have also estimated 95% confidence interval (CI) of OR based on Z-statistic [131,132]. The OR for the kth study is calculated as: For the fixed effect model, overall OR is calculated by using the Mentel-Haenszel (M-H) method as follows:ŷ and N k = m 1k + m 2k + m 3k + m 4k , and the variance and 95% C.I. of overall effect can be defined as: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi For the random effect model, overall OR is calculated by using the inverse variance method as follows:ŷ The random parameter θ R is calculated as, Where seðŷ R Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi varðŷ R Þ q ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi However, Q-test cannot give the assurance of model adequacy. Therefore we also considered the goodness of fit test to check the model adequacy. To check the model adequacy, we performed three distinct goodness of fit (GoF) tests proposed by Chen et al. [133]. These three GoF tests known as Anderson-Darling (AD) test [134,135], Cramer-von Mises (CvM) test [135][136][137] and Shapiro-Wilk (SW) test [138] for testing the null hypothesis that the individual effects follow the normal distribution. If individual effects are significantly normal, then random effect model is used for estimating the combined effect else fixed effect model is used. The test statistic of each normality test is defined as: where,ŷ k is the ordered data, � y is sample mean ofŷ k , K is sample size means number of individual study, F(ŷ k ) is cumulative distribution function of normal distribution with kth order statistic, a k is constants generated from means, variances, and covariances of the order statistics. To perform these three tests, Chen et al. [133] proposed the following steps: Step 1.Compute ad 0 , cvm 0 , and sw 0 from AD, CvM, and SW statistics, respectively, for Step 2. Resample B = 10 5 sub-samples from MVNð0;Ŝ), where, Then, compute ad j , cvm j , and sw j by using AD, CvM, and SW statistics, respectively, for each sample j (j = 1, 2, . . .,B).
Step ( Then the respective z-score is calculated as follows: ; for fixed effect model X k w kRŷ k ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X k w kR q ; for random effect model Subgroup analyses are also executed based on ethnicity and type of cancer by using the techniques mentioned above.
We have performed the sensitivity analysis using the full data and the reduced data that are obtained by removing the studies those are failed to pass the HWE validation and publication bias test. The publication bias is examined for each study visually by funnel plot and significantly by Egger regression test [139] and Begg's test [140]. The Egger regression test statistic is defined as: which follows the t-distribution with (K-2) degrees of freedom under the null hypothesis H 0 : α = 0 (no publication bias),â is obtained by the least square estimation using one of the following models:ŷ k ffi ffi ffi ffi ffi w k p ¼ a þ m ffi ffi ffi ffi ffi w k p þ ε k ; for fixed effect model; and ð18Þ y k ffi ffi ffi ffi ffi ffi ffi where ε k~i id N(0, σ 2 ). The Begg's test statistic is defined as: Z ¼ C À D ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi which follows asymptotically N(0,1) under the null hypothesis H 0 : α = 0 (no publication bias).
Here C and D are the number of concordant and discordant, respectively, those are obtained by using the Kendall's ranking of t � k andŝ 2 k orŝ 2 kR . Here: where, t k = OR k is the OR of kth study, and: ; for fixed effect model X k w kR t k ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X k w kR q ; for random effect model ; for fixed effect model We have used the 'meta' R-package (http://meta-analysis-with-r.org/) for implementing the above statistical methods for the meta-analysis.

Study characteristics
In this meta-analysis, first we reviewed 580 articles which mentioned the IL-6 gene in their titles and abstracts. Then 477 articles were selected after removing the duplication. Again we removed 337 articles due to the absence of full text, case-control and cancer related studies. Finally 118 articles were selected for final review by removing some studies having incomplete information. The flow chart of the studies selection process was shown in Fig 1. The finally selected articles included 103 studies for the rs1800795 SNP with 45238 cases and 57255 controls (Table 2), 27 studies for the rs1800796 SNP with 10733 cases and 13405 controls (  Table 2 and S1A-S1E Fig in S1 File).
The subgroup analysis through the types of cancer showed the significant association that the IL-6 -174G/C polymorphism performed a protective role in liver cancer for four genetic models    Table 2). The subgroup analysis according to ethnicity showed that the IL-6 -174G/C polymorphism was not significantly associated with the cancer risk of Caucasian and mixed populations ( Table 2) The subgroup analysis corresponding to cancer type and ethnicity were performed to observe the sources of heterogeneity. The results of our analysis suggested that the studies in breast cancer, cervical cancer, colon cancer, lung cancer, oral cancer, prostate cancer, stomach cancer, and the ethnicity of Asian, Caucasian and Mixed population were the main sources of heterogeneity (S1 Table).
IL-6 rs1800796 SNP. The results generated through this meta-analysis showed that the IL-6 -572G/C polymorphism was significantly associated with the overall cancer risk in the case of over-dominant model [CG vs. CC + GG: OR = 1.12, 95% CI = 1.01-1.23, pvalue = 0.0288] ( Table 3 and S1F-S1J Fig in S1 File). Though, it was not significantly associated with the overall cancer risk under the other four genetic models (Allelic, dominant, recessive and homozygote).
The subgroup analysis through the types of cancer showed the significant association that the IL-6 rs1800796 (-572G/C) performed a protective role in prostate cancer for four genetic models  Table 3).
The subgroup analysis based on ethnicity, the Asian population suggested that the IL-6 -572G/C polymorphism was significantly associated with increasing overall cancer risk for the over-dominant model [CG vs. CC + GG: OR = 1.13, 95% CI = 1.01-1.27, p-value = 0.0293]. The Caucasian and mixed ethnic group showed insignificant association of the IL-6 -572G/C polymorphism with the overall cancer risk ( Table 3).
IL-6 rs1800797 SNP. The finding of our analysis suggested that the IL-6 rs1800797 (-597G/A) polymorphism were not significantly associated with overall cancer risk under genetic models  Table 4 and S1K-S1O Fig in  S1 File).
The subgroup analysis based on cancer type showed that the blood, breast, colon, prostate and stomach cancers were not significantly associated with the IL-6 -597G/A polymorphism (Table 4) Table 4).
Source of heterogeneity. We found the insignificant heterogeneity of different studies in the analysis of IL-6 -597G/A polymorphism for overall cancer risk under the all genetic models. The subgroup analysis corresponding to cancer type and ethnic group were performed to observe the sources of heterogeneity. We found that only blood cancer was the main source of heterogeneity [A vs. G: Q = 6.56, df = 1, p-value = 0.0104, τ 2 = 0.7679, I 2 = 84.80%; AA + AG vs. GG: Q = 6.66, df = 1, p-value = 0.0099, τ 2 = 0.8110, I 2 = 85.00%] (S1 Table).

Publication bias
In this study the funnel plot was used to check the publication bias of IL-6 -174G/C and IL-6 -572G/C polymorphisms with the allelic model C versus G and IL-6 -597G/A polymorphism with the allelic model A versus G. According to the funnel plot, the distribution of ORs in terms of standard errors (SEs) was symmetric for each of three polymorphisms (-174G/C, -572G/C, -597G/A) and no publication bias was observed among the selected studies for this meta-analysis (Fig 2). Also, publication bias was checked through performing Begg's test and

Sensitivity analysis
The sensitivity analysis was conducted to increase the reliability of this meta-analysis. First, the meta-analysis was conducted considering all studies. Then, the studies that did not pass the HWE test were removed and the meta-analysis was performed again using the reduced dataset of the respective genetic models. The analyzed results showed an insignificant change of association which suggested that the meta-analysis analysis data generated through this study is both stable and robust (see S2C-S2E Table in S2 File).

Discussion and conclusion
In this paper we discussed the way of statistical modeling for meta-data analyses in details incorporating the goodness of fit test for checking the model adequacy. Then multi-case metaanalysis was conducted to find out the association of cancer risk with each of three SNPs (rs1800795, rs1800796, rs1800797) of the IL-6 gene. A total of 118 individual studies which included 50053 case and 65204 control samples, based on different cancers and ethnic groups were included in this extensive meta-analysis. The results computed through this study suggested that the IL-6 rs1800795 polymorphism is insignificantly associated with the overall cancer risk, but significantly reduced the risk of liver cancer under four genetic models (CC vs. GG; CC vs. CG + GG; CC + CG vs. GG; CG vs. CC + GG; C vs. G), which is in line with the previously reported multi-case meta-analysis in [130]. Also, this SNP showed significant association with the increasing risk of cervical and prostate cancers, where the results of cervical cancer are supported by the previous single-case meta-analysis in [123], but not with the multi-case meta-analysis in [130]. The results calculated for IL-6 rs1800796 polymorphism also showed significant association with overall cancer risk for one genetic model. This polymorphism showed significant association with the prostate and stomach cancers under four genetic models (CC vs. GG; CC vs. CG + GG; CC + CG vs. GG; C vs. G), where these results are supported by the previous multi-case meta-analysis in [130] and single-case meta-analysis in [111], respectively. Moreover, the results generated through this meta-analysis indicated that the rs1800796 polymorphism is significantly associated with the increasing risk of lung cancer. The IL-6 rs1800797 polymorphism analyzed data showed insignificant association with cancer risk, which is supported by previous single-case meta-analysis in [125]. Also, the results of this study showed the significant association of IL-6 rs1800797 polymorphism with increasing risk of cervical cancer, which showed insignificant association in [130]. The ethnicity based subgroup analysis data showed significant association between the rs1800795 polymorphism and the overall cancer risk of both African under three genetic models and Asian populations under four genetic models(CC vs. GG; CC vs. CG + GG; CC + CG vs. GG; C vs. G). For rs1800796 polymorphisms results suggested the significant association with the cancer risk of Asian populations. Also, the rs1800797 polymorphism was significantly associated with African ethnic groups for the cancer risk. All the results of subgroup analysis by ethnicity were supported by the previous multi-case meta-analysis in [130]. Thus, we observed that our multi-case meta-analysis results received more support than the previous multi-case meta-analysis results in [130] from the other single-case meta-analysis results in [123][124][125][126][127][128][129].
It should be mentioned here again that all of the previous meta-analyses [123][124][125][126][127][128][129][130] did not check the model adequacy through the goodness of fit test. To estimate the combined effects, all of them used fixed effect (FE) or random effect (RE) models based on Cochran's homogeneity test though the sample sizes were small for some individual studies. For being small sample sizes, the individual effects may not be followed the normal distribution and the Cochran's test may be produced misleading results about the homogeneity of individual effects. However, in our case, we used the GoF test suggested by Chen et al. [133] to fix the lack of model fitting. We observed that some of our fitted models contradict with the fitted models based on Cochran's homogeneity test and significant changes in association between gene polymorphisms and cancer risks. In particularly, we observed the changes with some overall and subgroup cases of all polymorphisms (rs1800795, rs1800796, rs1800797). Due to the contradictory model selections, contradictory associations were also observed for three cases of rs1800795 polymorphism (liver cancer: CG vs. CC + GG; Asian ethnicity: CC + CG vs.GG and C vs. G) and single case of the rs1800796 polymorphism (stomach cancer: CC + CG vs. GG). However, there were some limitations on conducting this meta-analysis like for heterogeneity factors such as age, sex, family history, levels of IL-6 expression were not considered and that might affect the association. The literature reviewed and selected for this study was in English language only; therefore, the publication bias could not be completely avoided or some selection bias might occur. Also, the small sample size may affect the results for some types of cancer.
In conclusion, the results of this study indicated that the IL-6 gene is significantly associated with the overall cancer risk. Particularly, this gene showed significant association with 5 types of cancer risks (liver, prostate, cervical, stomach and lung) and insignificant association with 11 types of cancer risks (blood, breast, colon, neuroblastoma, oral, skin, thyroid, ovarian, pancreatic and renal cell carcinoma) by the sub-group analysis of cancer types. Comparative discussion showed that our current multi-case meta-analysis results received more support than any other individual previous meta-analysis results about the association between the IL-6 gene SNPs (rs1800795, rs1800796 and rs1800797) and different types of cancer risks. Therefore, the results generated through this detailed systematic meta-analysis based on larger sample size of the IL-6 gene polymorphisms provides more evidence for further exploring the IL-6 gene as a very potent prognostic biomarker for early detection of various types of cancers.