Widespread Inter-Chromosomal Epistasis Regulates Glucose Homeostasis and Gene Expression

The relative contributions of additive versus non-additive interactions in the regulation of complex traits remains controversial. This may be in part because large-scale epistasis has traditionally been difficult to detect in complex, multi-cellular organisms. We hypothesized that it would be easier to detect interactions using mouse chromosome substitution strains that simultaneously incorporate allelic variation in many genes on a controlled genetic background. Analyzing metabolic traits and gene expression levels in the offspring of a series of crosses between mouse chromosome substitution strains demonstrated that inter-chromosomal epistasis was a dominant feature of these complex traits. Epistasis typically accounted for a larger proportion of the heritable effects than those due solely to additive effects. These epistatic interactions typically resulted in trait values returning to the levels of the parental CSS host strain. Due to the large epistatic effects, analyses that did not account for interactions consistently underestimated the true effect sizes due to allelic variation or failed to detect the loci controlling trait variation. These studies demonstrate that epistatic interactions are a common feature of complex traits and thus identifying these interactions is key to understanding their genetic regulation.

. Quantitative trait loci (QTLs) were identified for both body weight and plasma glucose levels that were due to main effects and interaction effects. Due to the study design, only QTLs with dominant effects could be assessed.
Omnibus tests for main effects on body weight indicated that some of the chromosome substitutions individually influenced body weight (males p=0.0028; females p=0.0008; meta p=1.4e-05). Similarly, omnibus tests for main effects on plasma glucose levels demonstrated a significant effect of the chromosome substitutions (males p=0.0082; females p=0.00011; meta p=1.4e-05). QTLs with main effects on body weight were mapped to chromosomes 8 (Main Effect: 1.23g; Average Effect: 1.02g) and 17 (Main Effect: -1.13g; Average Effect: -1.11g) (S3 Table). Note that we define main effects as the effect of a chromosome substitution as estimated by a model which includes all pairwise interaction terms, thus taking into account context-dependent genetic background effects. In contrast, the average effect is estimated using a model that does not include any interaction terms; the latter is similar to the analyses performed in a typical GWAS study. QTLs with main effects on fasting glucose were mapped to chromosomes 3 (Main Effect: 25.0 mg/dL; Average Effect: 9.61 mg/dL), 5 (Main Effect: 15.6 mg/dL; Average Effect: 6.02 mg/dL), and 4 (Main Effect: 17.5 mg/dL; Average Effect: 6.61 mg/dL) (S3 Table).
Omnibus tests for interaction effects on body weight were not significant (males p= 0.19; females p= 0.83; meta p= 0.44), and therefore epistatic interactions on body weight were not further investigated. However, omnibus tests for interaction effects on plasma glucose demonstrated the importance of epistasis in regulating this trait (males p= 0.002; females p= 0.003; meta p= 8.99e-05). In fact, among the males and females respectively, epistasis accounted for 43% (95% confidence interval: 23%-75%) and 72% (95% confidence interval: 37%-97%) of the heritable effects on plasma glucose levels. The discrepant results for the contribution of interactions to body weight and plasma glucose are likely reflected in the difference between whether QTLs for these traits were detected using the main effect model or the average effect model (S3 Table). For plasma glucose, only 1 of the 3 QTLs identified using the main effect model was also identified using the average effect model, and no new QTLs were identified with the average effect model. In contrast, both of the QTLs for body weight identified using the main effect model were also identified using the average effect model, and 2 new QTLs were identified on chromosomes 6 and 10. This suggests that for a trait regulated by epistatic interactions, the ability to successfully identify QTLs is greatly enhanced by accounting for these interactions. However, for a trait regulated primarily by additive effects, a model incorporating interactions can be detrimental to QTL identification.
To identify specific epistatic interactions, we tested explicit hypotheses for interchromosomal pairwise interactions on plasma glucose levels. Among the 15 CSS crosses analyzed, 5 crosses demonstrated inter-chromosomal epistatic interactions that altered plasma glucose levels (Fig. 1,S3,S4 Figs.). Interestingly, in all 5 crosses demonstrating interactions, one chromosome substitution increased fasting glucose levels relative to the control B6 strain. These main effects raised plasma glucose levels by an average of 12.3 mg/dL in males and 17.8 mg/dL in females. However, in all 5 observed interactions the average plasma glucose levels in the double CSSs were closer to the control B6 strain than any single CSS was. Furthermore, in 4 of the 5 interactions, the plasma glucose levels in the double CSS did not differ statistically from the control strain B6 (p value > 0.1). Thus, the chromosome substitution driving the increase in plasma glucose on a B6 background had no effect on glucose levels when the genetic background was altered by the second chromosome substitution.

Regulation of gene expression by epistasis.
As hepatic gluconeogenesis is a key determinant of plasma glucose levels in healthy insulin-sensitive mice [26], the hepatic gene expression patterns of control and CSS mice were analyzed to better understand the molecular mechanisms underlying the epistatic regulation of plasma glucose. The RNA-Seq data was filtered for genes expressed in the liver, leaving 13,289 genes that were tested for differential expression associated with both main and interaction effects. A total of 6,101 main effect expression QTLs (meQTLs) were identified (FDR < 0.05) (Fig. 2, S4 Table). Those meQTL genes located on the substituted chromosome were classified as cis-meQTLs (Fig. 2, red) whereas the meQTL genes not located on the substituted chromosome were classified as trans-meQTLs (Fig. 2, blue). Among all possible genes regulated by a cis-meQTL, on average 11.48% of these genes in each strain had a cis-meQTL (range: 5.54% -22.09%) (S5 Table). Similarly, among all possible genes regulated by a trans-meQTL, on average 5.42% (range: 0.08% to 19.26%) of these genes were regulated by a trans-meQTL (S5 Table). The percentage of cis-and trans-meQTLs in each strain demonstrated a strong positive linear correlation (Spearman's r = 1.0) but the proportion of cis-eQTLs was always greater than the proportion of trans-eQTLs.
In addition to the meQTLs regulated by substitution of a single chromosome, the analysis of double CSSs enabled the detection of eQTLs with additive and interaction effects between the substituted chromosomes. The expression of Zkscan3 represents an example of additivity, with the substitution of A/J-derived chromosomes 8 and 17 each individually increasing the expression of Zkscan3 relative to control B6 mice (S7 Fig.). In the double CSS strain (B6.A17 x B6.A8)F1, the effects of each individual chromosome substitution are combined in an additive manner to result in yet higher expression than either of the single CSSs (S7 Fig.). The additive effects of the Zkscan3 meQTLs detected by RNA-Seq were confirmed by quantitative reverse transcription PCR (S7 Fig.), as were 4/5 additional meQTLs demonstrating additivity (S8 Fig.).
In addition to examples of additivity, interaction expression QTLs (ieQTLs) were identified that were jointly regulated by genetic variation on two substituted chromosomes. The ieQTLs, similar to the meQTLs, were divided into cis-ieQTLS and trans-ieQTLs, with cis-ieQTLs defined by differentially expressed genes located on either one of the two substituted chromosomes and trans-ieQTLs representing differentially expressed genes that are not located on either substituted chromosome. A total of 4,283 ieQTLs were identified (S9 Table). Among all possible genes regulated by a cis-ieQTL or trans-ieQTL, 2.01% and 2.16% of genes were regulated by a cis-or trans-ieQTL respectively ( Table 1). The combination of A/J-derived chromosomes 8 and 14 yielded the most ieQTLs (n=2,305) including cis-ieQTLs regulating the expression of 17.56% of all genes on chromosomes 8 or 14 and trans-ieQTLs regulating the expression of 17.32% of all genes throughout the remainder of the genome. Overall, the ieQTLs demonstrated a similar positive linear correlation as the meQTLs (Spearman's r = 0.92) (S8 Fig.), although there was no enrichment for cis-ieQTLs. Among the genes regulated by an ieQTL(s), 32.35% (945 out of 2921) were regulated by multiple ieQTLs (Range: 2-7) (S10, S11 Tables). For example, Agt expression is decreased in strain (B6.A8 x B6)F1 relative to control B6 mice; however, interactions between alleles on chromosome 8 and chromosomes 6, 3, 17, and 14 all result in expression levels of Agt that did not differ from the control strain (Fig. 3).
Context-dependent effects on gene expression. We next tested whether the interaction effects on gene expression were synergistic (positive epistasis) or antagonistic (negative epistasis) (S9 Fig.). Synergistic refers to an increased difference in gene expression levels between the double CSS and the control B6 strain beyond that expected based on an additive model, whereas antagonistic refers to a decreased difference. The regulation of Agxt was an example of an antagonistic interaction, with main effects from substituted chromosomes 6 and 8 each individually decreasing Agxt expression, whereas this effect was lost in the double chromosome substitution strain (Fig. 4A). In contrast, the regulation of Cyp3a16 represented an example of synergistic interaction with the detection of an ieQTL in the absence of a meQTLs (Fig. 4B). Among the ieQTLs, antagonistic interactions accounted for 96% (n=4101) while synergistic interactions accounted for 4% (n=182) (Table 1). Remarkably, for 80% of the antagonistic interactions (3285/4101), gene expression in one or both of the single CSSs differed from the control B6 strain (a meQTL), whereas expression in the double CSS reverted to control levels (p > 0.1 relative to strain B6). To again validate the RNA-Seq data using an independent method, RT-qPCR was performed for a subset of genes with antagonistic (n=13) and synergistic (n=10) interactions. Replication by RT-qPCR confirmed the detection of epistasis in 61% (p <0.05) of the genes tested (Antagonistic: 8/13; Synergistic: 6/10) (S8 Table).
Significant contribution of epistasis to trait heritability. Given that the ieQTLs regulated approximately 2% of all genes expressed in the liver (Table 1), we sought to quantify the contribution of genetic interactions to the heritable component of all genes.
First, an omnibus test identified 6,684 genes out of the 12,325 genes expressed in the liver for which there was evidence of genetic control within the population of CSSs. The average proportion of heritable variation attributable to interactions across these genes was 0.56 (1 st quartile: 0.43 -3 rd quartile: 0.68) (Fig. 5A). When the same analysis was restricted to only genes with a statistically significant contribution of interactions to gene expression levels (n=3,236 genes), the proportion of heritable variation attributable to interactions increased to 0.66 (1 st quartile: 0.56 -3 rd quartile: 0.74) (Fig. 5B).

Discussion
CSSs, which have a simplified and fixed genetic background, were used to identify widespread and likely concurrent epistatic interactions. This systematic analysis of mammalian double CSSs demonstrated that epistatic interactions controlled the majority of the heritable variation in both fasting plasma glucose levels and hepatic gene expression (Fig. 5). Among genes expressed in the liver, the expression level of 24% were regulated, at least in part, by epistasis (Fig. 5). This number is remarkable considering that only dominant effects were tested, only a single tissue and time point were examined, allelic variation from only two inbred strains of mice were included, and only 15 pairwise strain combinations of CSSs were tested out of a possible 462 combinations of double CSSs. The prevalence of epistatic interactions provides a potential molecular mechanism underlying the highly dependent nature of complex traits on genetic background [21,22,30,31]. Interpreting the effect of individual allelic variants will thus be severely limited by population-style analyses that fail to account for possible contextual effects. Nonetheless, progress is being made in this field, including in diseases such as multiple sclerosis (MS), which is a complex genetic disease whose risk is highly associated with family history [32]. For example, MS risk alleles in DDX39B (rs2523506) and IL7R (rs2523506A) together significantly increase MS risk considerably more than either variant independently [15]. Based on the considerable number of interactions detected in the CSS crosses, context-dependent interactions such as that between DDX39B and IL7R in MS are likely widespread and may therefore represent a significant source of missing heritability for complex traits and diseases [33,34].
Although epistasis was a dominant factor regulating fasting glucose levels, the same effect was not detected in the regulation of body weight. It is not clear if this is due to different genetic architectures between these two traits or whether this was due to the limited genetic variation between the B6 and A/J strains. The body weight studies were conducted in mice fed a standard rodent chow, whereas differences in body weight between strains B6 and A/J are significantly more pronounced when challenged with a high-fat diet [35,36]. Alternately, a recent meta-analysis of trait heritability in twin studies identified significant variation in the role of additive and non-additive variation among different traits, with suggestive evidence for non-additive effects in 31% of traits [37]. Among the traits analyzed, genetic regulation of neurological, cardiovascular, and ophthalmological traits were among the most consistent with solely additive effects, whereas traits related to reproduction and dermatology were more often consistent with non-additive interactions. Among the metabolic traits studied, 40% of the 464 traits studied were consistent with a contribution of non-additive interactions [37]. It is interesting to speculate whether some traits that may have a more direct effect on fitness (e.g. reproduction) are more likely to involve multiple non-additive effectors in order to maintain a narrow phenotypic or developmental range [38].
Although many inter-chromosomal non-additive interactions were identified in mice, it remains unclear whether these interactions are attributable to bigenic gene-gene interactions or to higher-order epistasis involving multiple loci located on a substituted chromosome. Studies in yeast that dissected the genetic architecture of epistasis demonstrated that gene-gene interactions played a minor role among the heritable effects attributable to epistasis, thus primarily implicating higher order interactions [2]. Yet, other studies in yeast that methodically tested pairs of gene knockouts for interactions identified a number of gene-gene interactions [39]. Additional evidence for both highorder epistasis with three, four, and even more mutations [40] as well as bigenic genegene interactions [41] have been identified and it seems likely that both will underlie interactions detected in the CSS studies. However, to formally test this and determine the relative contribution of each, higher resolution genetic mapping of the epistatic interactions will be necessary to better understand their molecular nature .
Perhaps the most significant outcome of the epistasis detected was the high degree of constancy in the light of context dependence, such that the interactions usually returned trait values to the levels detected in control mice. Remarkably, this is just as Waddington predicted 75 years ago, a phenomenon he referred to as canalization . Canalization refers to the likelihood of an organism to proceed towards one developmental outcome, despite variation in the process along the way. This variation can be influenced by among other things the numerous functional genetic variants present in a typical human genome, which may contain thousands of variants that alter gene function . We find that most genetic interactions return trait values to levels seen in control strains, which would act to reduce phenotypic variation among developmental outcomes. This robustness in the face of considerable genetic variation is central to the underlying properties of canalization.
These genetic interactions therefore represent a mechanism for storing genetic variation within a population, without reducing individual fitness. This stored genetic variation could then enable populations to more quickly adapt to environmental changes Finally, the consistently greater effect sizes of main effects relative to average effects suggests that GWAS-type studies consistently underestimate true effect sizes in at least a subset of individuals. Therefore, the key to enabling precision medicine is to identify in which subset of individuals a particular variant has a significant effect. The consideration of epistasis in treatment, although in its infancy, remains a promising avenue for improving clinical treatment regimens, including predicting drug response in tumors and guiding antibiotic drug-resistance . However, true precision medicine will necessitate a more comprehensive understanding of how genetic background, across many loci, affects single variant substitutions.

Method:
Mice. Chromosome substitution strains (CSS) and control strains were purchased fromThe Jackson Laboratory. These strains include C57BL/6J-Chr3 A/J /NaJ mice (Stock . To avoid potential mapping biases, we created an "individualized genome" of the A/J mouse strain using the program Seqnature and the GENCODE vM7 gene annotations [46] were used to count the number of reads for each gene feature. Graphical depictions of the distribution CPM (counts per million) were used to remove 3 outlier samples. Genes where less than 75% of the samples had a count greater than or equal to 15 were considered to be expressed at low levels in liver and were removed leaving 13,289 genes that were considered expressed. To enhance reproducibility and reduce the dependence between the genes, svaseq was used to fit a model with main effects and pairwise interactions between each chromosome substitution. EdgeR uses a log link function, and thus departure from additivity in EdgeR is departure from a multiplicative model on the gene expression level.
The chromosome-chromosome interactions with FDR < 0.05 were divided into the categories synergistic and antagonistic based on the gene expression differences between the double CSS strain and the control strain relative to that predicted by an additive model (S9 Fig.).
To estimate the amount of variation attributable to interaction, we fit an additive model in  S  t  o  p  p  a  -L  y  o  n  n  e  t  D  .  T  h  e  b  i  o  l  o  g  i  c  a  l  e  f  f  e  c  t  s  a  n  d  c  l  i  n  i  c  a  l  i  m  p  l  i  c  a  t  i  o  n  s  o  f  B  R  C  A  m  u  t  a  t  i  o  n  s  :   w  h  e  r  e  d  o  w  e  g  o  f  r  o  m  h  e  r  e  ?  E  u  r  J  H  u  m  G  e  n  e  t  E  J  H  G  .  2  0  1  6  ;  2  4  S  u  p  p  l  1  :  S  3  -9 .  u  n  g  e  r  S  C  ,  R  a  g  h  u  p  a  t  h  y  N  ,  C  h  o  i  K  ,  S  i  m  o  n  s  A  K  ,  G  a  t  t  i  D  M  ,  H  i  n  e  r  f  e  l  d  D  A  ,  e  t  a  l  .  R  N  A  -S  e  q   A  l  i  g  n  m  e  n  t  t  o  I  n  d  i  v  i  d  u  a  l  i  z  e  d  G  e  n  o  m  e  s  I  m  p  r  o  v  e  s  T  r  a  n  s  c  r  i  p  t  A  b  u  n  d  a  n  c  e  E  s  t  i  m  a  t  e  s  i  n   M  u  l  t  i  p  a  r  e  n  t  P  o  p  u  l  a  t  i  o  n  s  .  G  e  n  e  t  i  c  s  .  2  0  1  4  ;  1  9  8  :  5  9  -7