Association of Genes, Pathways, and Haplogroups of the Mitochondrial Genome with the Risk of Colorectal Cancer: The Multiethnic Cohort

The mitochondrial genome encodes for the synthesis of 13 proteins that are essential for the oxidative phosphorylation (OXPHOS) system. Inherited variation in mitochondrial genes may influence cancer development through changes in mitochondrial proteins, altering the OXPHOS process, and promoting the production of reactive oxidative species. To investigate the role of the OXPHOS pathway and mitochondrial genes in colorectal cancer (CRC) risk, we tested 185 mitochondrial SNPs (mtSNPs), located in 13 genes that comprise four complexes of the OXPHOS pathway and mtSNP groupings for rRNA and tRNA, in 2,453 colorectal cancer cases and 11,930 controls from the Multiethnic Cohort Study. Using the sequence kernel association test, we examined the collective set of 185 mtSNPs, as well as subsets of mtSNPs grouped by mitochondrial pathways, complexes, and genes, adjusting for age, sex, principal components of global ancestry, and self-reported maternal race/ethnicity. We also tested for haplogroup associations using unconditional logistic regression, adjusting for the same covariates. Stratified analyses were conducted by self-reported maternal race/ethnicity. In European Americans, a global test of all genetic variants of the mitochondrial genome identified an association with CRC risk (P = 0.04). In mtSNP-subset analysis, the NADH dehydrogenase 2 (MT-ND2) gene in Complex I was associated with CRC risk at a P-value of 0.001 (q = 0.015). In addition, haplogroup T was associated with CRC risk (OR = 1.66, 95% CI: 1.19–2.33, P = 0.003). No significant mitochondrial pathway and gene associations were observed in the remaining four racial/ethnic groups—African Americans, Asian Americans, Latinos, and Native Hawaiians. In summary, our findings suggest that variations in the mitochondrial genome and particularly in the MT-ND2 gene may play a role in CRC risk among European Americans, but not in other maternal racial/ethnic groups. Further replication is warranted and future studies should evaluate the contribution of mitochondrial proteins encoded by both the nuclear and mitochondrial genomes to CRC risk.


Introduction
Colorectal cancer is the third most common cancer among men and women in the United States. In 2015, an estimated 220,800 new colorectal cancers (CRC) were diagnosed in the United States [1]. Approximately 35% of the risk of CRC is attributed to inherited factors [2]. Close to fifty risk loci for CRC have been identified by genome-wide association studies, which have focused on common variants of the nuclear genome [3][4][5][6][7]. However, these loci explain only a small proportion of the heritability of colorectal cancer and additional heritable factors remain to be discovered.
Over 70 years ago, Otto Warburg reported an altered metabolism among cancer cells characterized by an increase in glucose uptake and glycolysis despite an adequate oxygen supply for mitochondrial respiration, a phenomenon referred as 'aerobic glycolysis'. [8] Warburg hypothesized that this shift towards 'aerobic glycolysis' signified a deficiency in mitochondrial respiration, representing a fundamental cause of cancer. [9] This observation has now been confirmed in many types of cancer cells that exhibit elevated levels of glucose transport and increased rates of glycolysis-referred to as the Warburg effect. [10,11] The mitochondrial genome is a double-stranded circular DNA molecule of 16,569 base pairs which is highly polymorphic and contains almost no intergenic regions. [12,13] The proteins it encodes for essential functions in cellular metabolism and regulation of cell death [14]. Thirty-seven proteins are encoded by the mitochondrial DNA (mtDNA), of which 13 are involved in the oxidative phosphorylation (OXPHOS) machinery and 24 make up the RNA machinery (2 ribosomal RNAs and 22 transfer RNAs). The primary function of the mitochondrion is the production of the energy molecule, adenosine triphosphate (ATP), through the metabolic OXPHOS pathway.
Variations in mtDNA, including mitochondrial single nucleotide polymorphisms (mtSNPs), have the potential to modify mitochondrial function and lead to increased oxidative stress and cancer risk [15][16][17]. A Scottish study examined 132 mtSNPs in 2,854 cases and 2,822 controls and found no association with overall CRC risk. [18]. To our knowledge, no study to date has comprehensively examined the relationship between mtDNA variants and CRC risk across different racial/ethnic populations. Furthermore, a pathway based approach, which increases study efficiency for effects of modest size, may help to reveal associations between the mitochondrial genome and cancer risk.
Mitochondrial haplogroups are defined by unique sets of mtSNPs, reflecting specific ancestral populations as a result of the sequential accumulation of mitochondrial mutations through maternal lineages. Mitochondrial haplogroups have been associated with breast, prostate, and nasopharyngeal cancers [19][20][21][22]. Three studies have investigated the association between mitochondrial haplogroups and CRC risk in European and Asian populations with inconsistent results [18,21,23].
To comprehensively examine the role of the mitochondrial genome and CRC risk across multiple racial/ethnic groups, we genotyped a set of 185 mtSNPs to evaluate the association of genetic variation in the mitochondrial genome, pathways and genes, as well as of single mtSNPs and haplogroups, among 2,453 CRC cases and 11,930 controls of the Multiethnic Cohort (MEC) Study.

Study Subjects
The MEC is a large population-based cohort study of more than 215,000 men and women from Hawaii and California. The cohort is predominantly comprised of individuals from five racial/ethnic groups: African Americans, Asian Americans, European Americans, Latinos, and Native Hawaiians. Participants between the ages of 45 and 75 years were recruited from March 1993 through May 1996 and completed a 26-page self-administered questionnaire that included information regarding medical history, family history of cancer, diet, dietary supplements, medication use, and physical activity. Further details about this cohort are provided elsewhere [24].
Incident CRC cases were identified up to December 9, 2010 by cohort linkage to population-based Surveillance, Epidemiology and End Results (SEER) cancer registries covering Hawaii and California. Information on stage of disease at the time of diagnosis was also collected from the cancer registries. Blood samples were collected from incident colorectal, breast, and prostate cancer cases after their diagnosis, as well as a random sample of cohort members to serve as controls from 1996 through 2001, and prospectively from all willing surviving participants from 2002 through 2007. Informed consent was obtained at blood draw. Among the CRC cases used in this analysis, 70.4% had their blood drawn after diagnosis and 29.6% prior to diagnosis. Control subjects were men and women selected to serve as matched controls for nested case-control studies of colorectal, breast and prostate cancer. They were also selected to not have developed CRC before cohort entry or during follow-up as of December 9, 2010. This nested case-control study consisted of 2,453 CRC cases and 11,930 controls.
This study was approved by the institutional review board at the Cancer Prevention Institute of California.

mtSNP selection and genotyping
We abstracted mtSNP information from publicly deposited mtDNA sequencing data (Phylo-Tree mtDNA build 8, March 21, 2010) for 3,674 individuals comprising 599, 1,401, 1,118 and 556 subjects of African, European, Asian, and Latino ancestry, respectively. In addition, we sequenced the mtDNA of 160 Native Hawaiians using the Affymetrix resequencing array and identified 241 mtSNPs (MAF > 2%) in this population at a density of 1 mtSNP per 64 base pairs with an average call rate of 90.6% [25]. A total of 863 mtSNPs were selected, including 160 mtSNPs identified from the sequencing data and all missense mtSNPs (n = 230) and those previous associated with cancer (n = 37).
The genotyping of mtDNA was carried out in three phases using the Sequenom MassArray platform (Sequenom, San Diego). In the phase I, quantitative allelotyping was performed on DNA pools from 75 samples, to enable the rapid and affordable screening of the entire list of 863 putative mtSNPs. Allelotyping provides a quantitative estimate of allelic frequency in a mixture of DNA [26] with the goal of phase I to eliminate those mtSNPs with an undetectable minor allele frequency (MAF). A total of 240 mtSNPs were eliminated in phase I. In phase II, 619 of the remaining 623 mtSNPs were genotyped in a multiethnic panel of 376 subjects using the Sequenom iPLEX platform, providing robust MAFs of these mtSNPs across all five major ethnicities. Of the 619 mtSNPs genotyped, 186 mtSNPs were identified to have MAF greater than 0.02. In phase III, these 186 mtSNPs were genotyped in our nested CRC case-control study of 2,498 cases and 12,070 controls, using the Sequenom iPLEX platform. A total of 185 mtSNPs passed our quality control criteria of 95% call rate and MAF threshold >0.001. Stratifying on reported maternal race/ethnicity, 175, 168, 165, and 102 mtSNPs had a MAF>0.001 in African Americans, European Americans, Latinos, and Asian Americans, respectively; and 50 mtSNPs had a MAF > 0.005 in Native Hawaiians (using a less stringent threshold due to smaller sample size) (S1 Table). A total of 2,453 CRC cases and 11,930 controls were successfully genotyped with a call rate > 95%. The average individual call rate was 99.6% and the average concordance rate for 8% replicated samples was 99.7%.

Statistical analysis
To evaluate the cumulative effect of all mtDNA variants, variants in the OXPHOS pathway, complexes, and genes, we used the sequence kernel association test (SKAT_commonrare) [27][28][29]. The SKAT_commonrare test is an omnibus procedure allowing for both rare and common variants to contribute to the overall test statistic [29]. To estimate haplogroups, we used the HaploGrep software (http://www.haplogrep.uibk.ac.at) based on Phylotree build 16 [30,31] and categorized individuals based on the major haplogroups. We conducted unconditional logistic regression to examine the association between major haplogroups and CRC risk, using the most common haplogroup as the reference category. To test for single mtSNP associations with CRC risk, we also conducted unconditional logistic regression estimating p-values using a 1-degree-of-freedom Wald test. The overall analysis was adjusted for age, sex, self-reported maternal race/ethnicity, and the first five principal components of global ancestry. Principal components of genetic ancestry were estimated from genotype data for a panel of 128 ancestry informative markers genotyped in the MEC [32,33]. Previous work in the Multiethnic Cohort has shown that modest population stratification within simulated nested case-control studies was readily corrected for by adjusting for race/ethnicity or the top principal components of ancestry [34]. Additional adjustment for family history of colorectal cancer, dietary intakes of fiber, calcium, folate, alcohol, vigorous physical activity, and smoking did not notably alter results. Thus, these covariates were not included in our final multivariate models. Moreover, we also tested all these associations stratifying on self-reported maternal race/ethnicity and anatomical subsite. All statistical tests presented are two-sided. A false discovery rate (FDR) was estimated to address p-value inflation due to multiple hypothesis testing and a q value<0.1 was used to determine statistical significance.
Single mtSNP analyses were done using PLINK software (version 1.9). mtSNP-set based analyses were done using the SKAT package in R (version 3.0.3). The Mitochondrial solar plot (Fig 1) was drawn using ggplot2 package in R.

Study Characteristics
Study characteristics of the 14,383 study subjects (2,453 CRC cases; 11,930 controls) are presented in Table 1. Colorectal cases were older and had a higher proportion of males than controls. The distribution of self reported maternal race/ethnicity included Asian Americans (28.69%), African Americans (24.35%), European Americans (21.42%), Latinos (20.45%), and Native Hawaiians (4.90%). Cases were more likely to report a family history of colorectal cancer, a history of polyps, and a history of diabetes than controls. Approximately 76% of cases occurred in the colon and 50% of cases were localized stage.

Mitochondrial Genome, Pathway, and Gene Associations
A global test of all 168 mtSNPs in the mitochondrial genome (MAF >0.001) showed a significant association with CRC risk in self-reported maternal European Americans (P = 0.04; Table 2), while no associations were seen in other maternal racial/ethnic groups or in the whole sample (S2 Table). For European Americans, when restricting the mtSNP-set to the OXPHOS pathway, comprised of 133 mtSNPs, the association with CRC risk had a P = 0.029 (q = 0.054 Table 2). Within the OXPHOS pathway, complex I (80 mtSNPs; P = 0.025; q = 0.081) and complex III (15mtSNPs; P = 0.027; q = 0.081) were associated with CRC risk. To further investigate the Complex I association, we conducted an analysis focusing on missense and non-missense mtSNPs separately. Collectively, both missense (22 mtSNPs, P = 0.024) and non-missense mtSNPs (58 mtSNPs, P = 0.04) in Complex I were associated with CRC risk at P<0.05 (S3 Table).

mtSNP Associations
Overall, 14 of 185 mtSNPs were associated with CRC at P<0.05 in the total study population (S5 Table). In stratified analysis by maternal race/ethnicity, 7 of 154 mtSNPs, 4 of 97 mtSNPs, 18 of 147 mtSNPs, and 22 of 156 mtSNPs were associated with CRC risk in African Americans, Asian Americans, European Americans, and Latinos, respectively at P<0.05 (S5 Table). No mtSNP associations were observed in Native Hawaiians. Of the 14 mtSNPs associated with overall CRC risk, the most significant association was seen with the missense mtSNP, mt4917 located in MT-ND2 (OR = 1.52; 95% CI: 1.16-2.01; P = 0.0029, q = 0.308). The minor allele mt4917 (G) varies substantially across the five maternal racial/ethnic groups. Specifically, mt4917 was common in European Americans (MAF = 0.10), rare in African American (MAF = 0.005), Latinos (MAF = 0.006), and was monomorphic in Asian Americans and Native Hawaiians. In European Americans, three mtSNPs in the gene MT-ND2 were nominally associated with CRC risk at P<0.05 (Table 3). The strongest association was observed with mt4917 (OR = 1.55; 95% CI:1.15-2.10; P = 0.004, q = 0.16). Fig 1 presents the mitochondrial solar plot (given the circular nature of mtDNA) of mtSNP associations with CRC risk among European Americans and the correlation between mtSNPs with mt4917. There was a high correlation between mt4917 (r 2 >0.75) and the seven other mtSNPs across the mitochondrial genome.

Haplogroup Associations
Haplogroup T was common in European ancestry populations, occurring at a frequency of 9.6% in controls and absent in the other racial/ethnic groups, was significantly associated with CRC risk in European Americans (OR = 1.66, 95% CI: 1.19-2.33, P = 0.003, Pcorrection = 0.015,   Table).

Discussion
In this study of 14,383 CRC cases and controls, we comprehensively examined the contribution of the mitochondrial genome to CRC risk. To our knowledge, this is the first study to systematically evaluate the mitochondrial genome and its pathway, gene sets, and haplogroups in relation to CRC across multiple maternal racial/ethnic groups. Pathway analyses revealed that the mitochondrial genome and the oxidative phosphorylation pathway play a suggestive role in the CRC risk among European Americans. In addition, an association between the MT-ND2 gene and CRC risk was observed among European Americans with stronger association seen in colon tumors. Haplogroup T was found to be associated with CRC risk among European Americans independent of global ancestry. Our analysis of the entire mitochondrial genome demonstrated evidence of an association with CRC risk in the European Americans (P = 0.04), in which the OXPHOS pathway may play an important role (P = 0.029; q = 0.054). A byproduct of OXPHOS is the production of reactive oxygen species (ROS), which can generate free radicals and is involved in many cellular processes including apoptosis, inflammation and oxidative stress that may contribute to aging, degenerative diseases and cancer [15,35]. Our gene based analysis further suggested that MT-ND2, a member of the OXPHOS pathway that encodes for the subunit of NADH, is associated with CRC risk in European Americans (P = 0.001; q = 0.015). A recent study reported over expression of MT-ND2 in CRC tumors vs. normal tissue, which was correlated with lower methylation of the mtDNA D-loop and also significantly associated with stage of disease [36]. These findings support the role of MT-ND2 in CRC development.
The distribution of haplogroups in the MEC was consistent with previously published data on U.S. population-based samples [37]. The frequency of haplogroup T among our European American control subjects (9.57%) is consistent with the Mitomap database (variance between 8%-11% from West to East Europeans) [38] and non-Hispanic Whites in the National Health and Nutrition Examination Surveys (NHANES) (9.6%) [37]. Two studies have reported no associations between mtDNA haplogroups and CRC risk among Chinese and Scottish populations [18,21], while an association between haplogroup B4 and CRC risk was reported in a Korean population [23]. We identified an association between haplogroup T and CRC risk in European Americans independent of global ancestry. Haplogroup T is defined by nine polymorphisms [30,39], including five RNA variants (G709A, G1888A, T8697A, T10463C, G15928A), three synonymous (G13368A, G14905A, A15607G), and one non synonymous (A4917G) polymorphisms. The mtSNP A4917G is the diagnostic mtSNP for haplogroup T and a highly conserved polymorphism in the MT-ND2 gene [22,30,39]. The lack of an association with haplogroup T in the Scottish study [18] may be due to the use of different mtSNPs to define haplogroup T (T4217, G10399A and A12309G). Ruiz-Pesini et al. [22]hypothesized that mt4917 has been retained by adaptive selection and is believed to play an important role in human migration out of Africa into colder climates, with only the MT-ND2 lineage retrained in haplogroup T due to selection pressures [22]. This may explain the higher frequency of mt4917 in European Americans and its relative absence in African Americans, Latinos, Asian Americans, and Native Hawaiians. The Scottish study found no association between 132 mtSNPs and overall CRC risk, yet suggested the variant A5657G in tRNA (MAF = 0.01) was associated with colon tumors (P = 0.002) [18]. While we did not genotype this mtSNP, which is located close to the MT-ND2 gene (145 base pair distance), we did observe an association between the MT-ND2 gene and colon tumors (P = 7.0x10 -4 ), which may support the reported association. Given the wide spectrum of risk alleles of rare, low-frequency, and common genetic variants in mitochondrial genome, our study is strengthened by using the SKAT common/rare approach to collectively test multiple risk alleles that may have modest effects [27,28]. This approach has improved power compared to single SNP tests in the presence of correlation between SNPs and overcomes the limitation of previous methods that upweight rare variants [29,40]. Using this approach, we were able to capture the role of MT-ND2 gene and CRC risk. In addition, our study strengths include the investigation of multiple racial/ethnic populations, the examination of mtSNPs based on sequencing data for all five populations, and a comprehensive evaluation of the mitochondrial genome and CRC risk. Limitations of this study include the modest sample size for each population to detect weak genetic effects, particularly among Native Hawaiians. For a mtSNP with MAF = 0.10 and alpha = 0.05, our study has 80% power to detect a there is a possibility of false positive results given the number of hypothesis tested as our findings do not meet a stringent Bonferroni correction. In summary, our study suggests that variation in the mitochondrial genome may play a role in CRC risk among European Americans. The findings of associations between genetic variants in MT-ND2 and haplogroup T with CRC risk warrants replication in other European American populations. Future studies should examine the expression of MT-ND2 in colorectal tumor and test mitochondrial genes encoded by both the nuclear and mitochondrial genomes to fully examine their contribution to CRC risk.
Supporting Information S1