Meta-Analysis of Mismatch Repair Polymorphisms within the Cogent Consortium for Colorectal Cancer Susceptibility

In the last four years, Genome-Wide Association Studies (GWAS) have identified sixteen low-penetrance polymorphisms on fourteen different loci associated with colorectal cancer (CRC). Due to the low risks conferred by known common variants, most of the 35% broad-sense heritability estimated by twin studies remains unexplained. Recently our group performed a case-control study for eight Single Nucleotide Polymorphisms (SNPs) in 4 CRC genes. The present investigation is a follow-up of that study. We have genotyped six SNPs that showed a positive association and carried out a meta-analysis based on eight additional studies comprising in total more than 8000 cases and 6000 controls. The estimated recessive odds ratio for one of the SNPs, rs3219489 (MUTYH Q338H), decreased from 1.52 in the original Swedish study, to 1.18 in the Swedish replication, and to 1.08 in the initial meta-analysis. Since the corresponding summary probability value was 0.06, we decided to retrieve additional information for this polymorphism. The incorporation of six further studies resulted in around 13000 cases and 13000 controls. The newly updated OR was 1.03. The results from the present large, multicenter study illustrate the possibility of decreasing effect sizes with increasing samples sizes. Phenotypic heterogeneity, differential environmental exposures, and population specific linkage disequilibrium patterns may explain the observed difference of genetic effects between Sweden and the other investigated cohorts.


Introduction
In recent years low-risk common alleles have attracted increasing attention in the search for the ''missing heritability'' in colorectal cancer (CRC). It concerns the part of heritability that cannot be explained by mutations in already known high-risk genes but should, according to twin studies, account for about 35% [1]. Known high-penetrance germline mutations in CRC genes contribute for less than 6% of the observed cases [2]. Therefore, much of the remaining inherited variation in genetic susceptibility is probably due to multiple low-penetrance variants, both common and rare.
To date sixteen common variants have been identified through large multi-centre genome-wide association studies (GWAS) [3]. Taken together, however, they only explain a small proportion of familial CRC cases. Although the risk associated with each of these variants is modest, they contribute to the disease burden due to their high frequency in the population and the possibility of acting in concert with each other, which may increase the individual's risk of developing CRC [4].
Against this background, a few years ago we attempted to assess the role of eight SNPs in four already known CRC genes (APC, MLH1, MSH6 and MUTYH) through a case-control association study in the Swedish population [5]. These 8 SNPs had been previously studied, but their pathogenicity was unknown and they were assumed to constitute polymorphisms. In our first study several positive associations were detected but, due to limited sample size (1785 cases and 1722 controls) [5], the results needed to be validated in a follow-up study.
The present study was an initiative of the COGENT consortium [4,6], where different groups offered to extend the genotyping to other non-Swedish cohorts for SNPs showing statistically significant associations in at least one analysis of the original study. This restricted the analysis to six out of the original eight SNPs.

Ethics statement
Collection of blood samples and clinical information from patients and controls was obtained with informed consent in accordance with the tenets of the Declaration of Helsinki. All participants gave written informed consent to take part in the study. The study was undertaken in accordance with the Swedish legislation of ethical permission (2003:460) and approved by the Stockholm Regional Research Ethical Committee (Dnr 2002:489).

Subjects
Details regarding the number of cases and controls in all fourteen studies are summarized in Table S1. One SNP, rs459552 (APC D1822V), was genotyped in seven studies, for a total of 8654 cases and 7731 controls. Four SNPs, rs1799977 (MLH1 I219V), rs1800932 (MSH6 P92P), rs1800935 (MSH6 D180D) and rs3219484 (MUTYH V22M) were genotyped in 8 studies for a total of 8308 cases and 7434 controls. The SNP with rs number 3219489 (MUTYH Q338H) was genotyped in 13 cohorts for a total of 12902 cases and 14602 controls.
For all the subjects genomic DNA was extracted from peripheral blood by standard procedures. Additional information regarding localization of the tumor, age at diagnosis, gender and ethnicity was retrieved whenever possible. Out of 5770 controls with ethnicity information, 5647 were of Caucasian origin, the rest being mostly African American.

Statistical analysis
Deviations of observed genotype frequencies in controls from those expected under Hardy-Weinberg equilibrium were assessed by x 2 tests. Risks of CRC associated with genotypes were compared by odds ratios (ORs) with corresponding confidence intervals (CIs) based on logistic regression. Study heterogeneity was summarized using a Mantel-Haenszel test but we assumed that the studies were random samples from a general population and used a random effect model to summarize OR estimates under dominant, recessive and additive penetrance models in the meta-analyses. Results were represented by forest plots as follows: confidence intervals for each individual study were indicated by horizontal lines, single ORs by squares and summary estimates by diamonds with horizontal limits at confidence limits and width inversely proportional to the standard error. Meta-analyses were performed using the package rmeta in the free software environment for statistical computing R.

Results
The distribution of the genotypes in controls did not deviate from Hardy-Weinberg equilibrium in any study. Mantel-Haenszel tests identified study heterogeneity for rs1800932 (MSH6 P92P) under recessive and additive penetrance, with p-values equal to 0.04 and 0.03, respectively (Table S2). This does not constitute a major issue since this SNP showed no differences between the genotype distributions of cases and controls either in single studies or in the global analysis. Study heterogeneity was not found for any other SNP. Genotyping results for the 6 SNPs based on studies 1-8 are presented in Table S2.
The only SNP that was marginally significant in the metaanalysis was rs3219489 (MUTYH Q338H), both under a recessive model (summary OR = 1.08, 95% CI 1.00 to 1.17; p = 0.05) and assuming additive allelic effects (summary OR = 1.07, 95% CI 1.00 to 1.14; p = 0.06). We ascribe the combined result mainly to the Swedish study, with individual ORs of 1.18 (95% CI = 1.01-1.38, recessive model) and 1.19 (95% CI = 1.05-1.35, additive model) ( Table S2). The goodness of fit was slightly better for the recessive than for the additive model, and the recessive and additive models clearly outperformed the dominant model.
In an attempt to validate the findings under recessive inheritance, we set up collaborations with additional groups and requested to genotype rs3219489 in their cohorts. In the end, additional 4234 cases and 6800 controls were included, adding up to a total of 12232 cases and 13380 controls (Table S3).
We updated the meta-analysis once more considering all samples regardless of tumor localization as well as stratifying them for colon and rectal tumors. As shown in Table S4, data were available for 4573 colon and 1774 rectal cancer cases. Results from the updated meta-analyses are presented in Figure 1. The new summary OR for colorectal cancer was 1.03 (95% CI 0.97 to 1.10, probability value 0.25) ( Figure 1A). The summary OR was practically identical after adjustment for age and gender OR = 1.03 (95% CI 0.93 to 1.13). Study heterogeneity was not noticed (P = 0.29, data not shown). The combined OR for colon cancer was 1.07 (95% CI 0.99 to 1.16, probability values 0.09 (OR = 1) and 0.37 (study homogeneity) ( Figure 1B) and for rectal cancer was 1.06 (95% CI 0.94 to 1.19, probability values 0.37 (OR = 1) and 0.31 (study homogeneity)) ( Figure 1C).

Discussion
In the present investigation we performed a case-control association study for six out of eight previously investigated SNPs [5]. For five of them, rs459552 (APC D1822V), rs1799977 (MLH1 I219V), rs1800932 (MSH6 P92P), rs1800935 (MSH6 D180D) and rs3219484 (MUTYH V22M) samples were retrieved from eight additional studies totaling 8308 cases and 7434 controls. For the sixth SNP, rs3219489 (MUTYH Q338H), which was selected based on promising results from two samples of Swedish origin (study 8 in the present manuscript and reference [5]), we set up an even larger replication dataset comprising 14 different studies with a total of 12232 cases and 13380 controls.
For all SNPs included in the analysis we were unable to confirm the associations with CRC risk found in the Swedish population. In particular, the recessive ORs of CRC for rs3219489 decreased from 1.52 in the original Swedish study to 1.18 in the Swedish replication cohort, to 1.08 (95% CI 1.00 to 1.17) in the first metaanalysis and to 1.03 (95% CI 0.97 to 1.10) in the updated metaanalysis (Table S2). The summary ORs in the extended metaanalyses were 1.07 (95% CI 0.99 to 1.16) for colon cancer and 1.06 (95% CI 0.94 to 1.19) for rectal cancer, in contrast with results based on Swedish samples. The updated meta-analysis had statistical power of 99% to detect a recessive OR of 1.52 and a power of 89% to detect a recessive OR of 1.18 (Type I error rate 5% and prevalence of CC genotypes among controls 5.6%). Biological plausibility was also existent. MUTYH Q338H is interesting because it represents a missense change in the MUTYH protein, which is involved in the base excision repair (BER) pathway. A common product of oxidative damage to 29deoxyguanosine is 7,8-dihydro-8-oxo-29-deoxyguanosine (OG) [10,11]. In mammalian cells OG has been shown to be highly mutagenic and leading to an increased rate of GRT transversions, due to its miscoding properties that cause a mispairing with an adenine during DNA replication to form a stable OG:A mismatch [11,12]. The BER pathway plays an important role in repairing this type of DNA damage through the action of the mutY homolog MUTYH, in concert with OGG1 and MTH1 [11,13]. It is well established that biallelic mutations in MUTYH gene introduce G:C to T:A transversions also in the adenomatous polyposis coli (APC) gene, leading to genomic instability and abnormal and disregulated cell proliferation in the colonic epithelium [14,15]. Patients with two mutations in the MUTYH gene develop the MUTYH-associated polyposis (MAP) syndrome [13].
To date, 85 different MAP-associated mutations have been found [16], scattered throughout the entire length of the protein, but only 3 (including Q338H) map within putative protein interaction domains as revealed by the recently solved crystal structure of hMUTYH [17]. It is tempting to speculate that Q338H might affect this protein-protein interaction, but additional experimental support is warranted.
The contrasting results on rs3219489 and its association with CRC risk in the Swedish versus other populations might suggest that the effect of this variant is specific for the Swedish population or not large enough in the other populations to be detected with the present sample size. For example, the statistical power of the updated meta-analysis was only 43% to detect a recessive OR of 1.10 (Type I error rate 5% and prevalence of CC genotypes among controls 5.6%). A closer look at the data actually shows that one of the German cohorts (ESTHER) gave results in agreement with our Swedish cohorts, with OR = 1.36 (95% CI 1.00 to 1.86) for colorectal cancer ( Figure 1A) and OR = 1.61 (95% CI 1.08 to 2.40) for rectal cancer ( Figure 1C). This is likely a spurious result due to the small size of that cohort (318 cases and 365 controls).
On the other hand, in agreement with Swedish results, rs3219489 has also been shown to be associated with CRC risk in three independent studies in the Japanese population [18,19,20] and among African-Americans (Yuan et al., 2nd InSiGHT meeting, Yokohama, Japan, unpublished) even though all these studies have a limited sample size and the results need further validation.
It is also possible that rs3219489 represents a risk-associated variant in the Swedish population in combination with environmental factors in the broad sense. For example, screening programs for CRC in Sweden could result in a diagnosis earlier in life, thus inflating the ORs estimated in Sweden. Another alternative is that the polymorphism is in linkage disequilibrium with other unidentified causal variants. The marker and the causal variant could be located on the same risk haplotype in the Swedish population and on different haplotypes in other populations.
Independently of the unknown reason for replication failure, the results from the present study clearly illustrate the possibility of decreasing effect sizes with increasing collections of individuals, a phenomenon well-known in the field of genetic epidemiology denominated the winner's curse [21]. It should be kept in mind that this outcome is rather expected in association studies, in particular those dealing with regionally heterogeneous complex diseases.