Integrative QTL analysis of gene expression and chromatin accessibility identifies multi-tissue patterns of genetic regulation

Gene transcription profiles across tissues are largely defined by the activity of regulatory elements, most of which correspond to regions of accessible chromatin. Regulatory element activity is in turn modulated by genetic variation, resulting in variable transcription rates across individuals. The interplay of these factors, however, is poorly understood. Here we characterize expression and chromatin state dynamics across three tissues—liver, lung, and kidney—in 47 strains of the Collaborative Cross (CC) mouse population, examining the regulation of these dynamics by expression quantitative trait loci (eQTL) and chromatin QTL (cQTL). QTL whose allelic effects were consistent across tissues were detected for 1,101 genes and 133 chromatin regions. Also detected were eQTL and cQTL whose allelic effects differed across tissues, including local-eQTL for Pik3c2g detected in all three tissues but with distinct allelic effects. Leveraging overlapping measurements of gene expression and chromatin accessibility on the same mice from multiple tissues, we used mediation analysis to identify chromatin and gene expression intermediates of eQTL effects. Based on QTL and mediation analyses over multiple tissues, we propose a causal model for the distal genetic regulation of Akr1e1, a gene involved in glycogen metabolism, through the zinc finger transcription factor Zfp985 and chromatin intermediates. This analysis demonstrates the complexity of transcriptional and chromatin dynamics and their regulation over multiple tissues, as well as the value of the CC and related genetic resource populations for identifying specific regulatory mechanisms within cells and tissues. Author summary Genetic variation can drive alterations in gene expression levels and chromatin accessibility, the latter of which defines gene regulatory elements genome-wide. The same genetic variants may associate with both molecular events, and these may be connected within the same causal path: a variant that reduces promoter region chromatin accessibility, potentially by affecting transcription factor binding, may lead to reduced expression of that gene. Moreover, these causal regulatory paths can differ between tissues depending on functions and cellular activity specific to each tissue. We identify cross-tissue and tissue-selective genetic regulators of gene expression and chromatin accessibility in liver, lung, and kidney tissues using a panel of genetically diverse inbred mouse strains. Further, we identify a number of candidate causal mediators of the genetic regulation of gene expression, including a zinc finger protein that helps silence the Akr1e1 gene. Our analyses are consistent with chromatin accessibility playing a role in the regulation of transcription. Our study demonstrates the power of genetically diverse, multi-parental mouse populations, such as the Collaborative Cross, for large-scale studies of genetic drivers of gene regulation that underlie complex phenotypes, as well as identifying causal intermediates that drive variable activity of specific genes and pathways.

Determining the mechanisms by which genetic variants drive molecular, cellular, and 2 physiological phenotypes has proved to be challenging [1]. These mechanisms can be 3 informed by genome-wide experiments that provide data on variations in molecular and 4 cellular states in genotyped individuals. Most examples of such data, though, are largely 5 observational, due in part to constraints of specific populations (e.g., humans), the 6 limitations of existing experimental technologies, and the challenge of coordinating large 7 numbers of experiments with multiple levels of data [2]. One approach to shed light on 8 these dynamics is to pair complementary datasets from the same individuals and 9 perform statistical mediation analysis (e.g., [3,4]), which has increasingly been used in 10 genomics [6]. These analyses can identify putative causal relationships rather than 11 correlational interactions, providing meaningful and actionable targets in terms of 12 downstream applications in areas such as medicine and agriculture. 13 In human data, co-occurence of QTL across various multi-omic data has been used 14 to assess potentially related and connected biological processes; examples include gene 15 expression with chromatin accessibility [7] or regulatory elements [8], and ribosome 16 occupancy with protein abundances [9]. More formal integration through statistical 17 mediation analyses has also been used to investigate relationships between levels of 18 human biological data, such as distal genetic regulation through local gene 19 expression [10,11], and eQTL with regulatory elements [12][13][14] and physiological 20 phenotypes, such as cardiometabolic traits [15]. 21 Though genetic association studies of human populations have been highly 22 successful [16], animal models allow for more deliberate control of confounding sources 23 of variation, including experimental conditions and population structure, and as such 24 provide a potentially powerful basis for detecting associations and even causal linkages. 25 Towards this end, genetically-diverse mouse population resources have been established, 26 Fig 1. Diagram of the experiment and analyses. RNA-seq and ATAC-seq were performed using liver, lung, and kidney tissues of males from 47 CC strains. Each CC strain was derived from an inbreeding funnel, and thus represents a recombinant inbred mosaic of the initial eight founder haplotypes. Differential analyses followed by pathway enrichment analyses were performed to identify biological pathways enriched in differentially expressed genes and accessible chromatin regions. QTL and mediation analyses were performed to identify regions that causally regulate gene expression and chromatin accessibility.
Differentially expressed genes strongly correspond with accessible 72 chromatin regions 73 Differential expression (DE) and differentially accessible region (DAR) analysis were 74 performed between the three tissues (S1 Table) and revealed between 3,564 -5,709 DE 75 genes and 28,048 -40,797 DARs (FWER ≤ 0.1). For both expression and chromatin 76 accessibility, liver and kidney tissues were the most similar, whereas lung and liver were 77 the most distinct, also seen in the PCA plots (S1 Fig). Pathway analyses showed many 78 between-tissue differences related to metabolic and immune-related pathways (FWER 79 ≤ 0.1), reflecting the distinct demands of each tissue. Energy metabolism pathways 80 were more active in liver and kidney and immune-related pathways were more 81 pronounced in lung, as expected. We compared the concordance between DE genes and 82 DARs genome-wide and observed that most DE gene promoters do not show significant 83 differences in chromatin accessibility (S2 Fig). In cases with significant variability in 84 accessibility at the promoter of a DE gene, though, the vast majority agree in direction 85 (i.e., higher expression with greater accessibility). 86 QTL detection 87 Gene expression 88 The impact of genetic variation on gene expression was evaluated by eQTL mapping. 89 This was done at three levels of stringency and emphasis: 1) at the level of the local 90 region of a gene, defined as within 10Mb of the gene transcription start site (TSS), and 91 hereafter termed Analysis L; 2) at the level of the chromosome on which the gene is 92 located (Analysis C); and 3) at level of the genome (Analysis G) (details in Methods). 93 After filtering out lowly expressed genes, the number of genes examined in eQTL 94 mapping was 8401 for liver, 11357 for lung, and 10092 for kidney (UpSet plot [61] in S3 95 FigA). 96 Analysis L detected local-eQTL for 19.8% of genes tested in liver, 16.6% in lung, and 97 20.8% in kidney (S2 Table). Local-eQTL for most genes were observed in only one 98 tissue (S4 FigA). Analysis C, which was more stringent, additionally detected 99 intra-chromosomal distal-eQTL, while Analysis G, the most stringent, additionally 100 detected inter-chromosomal distal-eQTL (S2 Table). Genomic locations of eQTL 101 detected for each tissue, excluding the intra-chromosomal distal-eQTL detected by 102 Analysis C, are shown in Fig 2A[ To determine genetic effects on chromatin structure, genomic regions were divided into 106 ∼300 base pair windows and analyses similar to eQTL were used to detect chromatin 107 accessibility QTL (cQTL). After filtering regions with low signal across most samples, 108 the number of chromatin regions tested were 11448 in liver, 24426 in lung, and 17918 in 109  Table, S5 Table). As with eQTL, cQTL were more likely to be local 114 than distal (66 -94.1% local-cQTL for Analysis G; 75 -90% local-cQTL for Analysis C). 115

116
For QTL detected on the same chromosome as the gene or chromatin region 117 (intra-chromosomal), the strongest associations were observed within 10Mb of the gene 118 TSS or chromatin window midpoint ( Fig 2B and S6 Fig). Intra-chromosomal 119 distal-QTL had reduced statistical significance, more consistent with inter-chromosomal 120 distal-QTL. This dynamic between local-and distal-QTL is also observed when using 121 the QTL effect size as a measure of strength, which is likely biased upward due to the 122 Beavis effect in 47 strains [46], shown in Fig 2C and  For all QTL we estimated the effects of the underlying (founder) haplotypes. Consistent 125 with previous studies (e.g., [42]), the CAST and PWK haplotypes had higher 126 magnitude effects compared with the classical inbred strains. This pattern was observed 127 for both local-and distal-eQTL (S8 FigA) and local-cQTL; the numbers of detected chromatin region midpoint; thus, the maximum possible distance between them was 134 20Mb. In the case of distal-eQTL, they had to be within 10Mb of each other.

135
Consistency between these cross-tissue QTL pairs was assessed by calculating the 136 correlation between their haplotype effects (FDR ≤ 0.1; Fig 2D and  expression levels within each tissue were consistent, we noted that the expression level 158 was significantly higher in liver compared with both lung (q = 5.41 × 10 −18 ) and kidney 159 (q = 6.25 × 10 −19 ). Ubiquitin C (Ubc) is another example of a gene with consistent  Cox7c possesses local-eQTL with highly correlated haplotype effects across all three tissues, supportive of shared causal origin. (B) Slc44a3 has a more complicated pattern of local-eQTL haplotype effects across the tissues, with correlated effects shared between lung and kidney, and transgressive effects in the liver eQTL by comparison, consistent with distinct casual variants comparing liver to lung and kidney. For both genes, the expression data are plotted with bars representing the interquartile ranges of likely founder haplotype pair (diplotype). Differential expression between tissues is highlighted. The haplotype association for each tissue is also included near the gene TSS with variant association overlaid. The most statistically rigorous method that detected the QTL (Analysis L, C, or G) is also included. The red tick represents the gene TSS, the black tick represents the variant association peak, and the colored tick represents the haplotype association peak. Haplotype effects, estimated as constrained best linear unbiased predictions (BLUPs).
Slc44a3 and Pik3c2g : Tissue-specific haplotype effects for local-eQTL 162 We found instances where haplotype effects across the three tissues were inconsistent, 163 referring to these as tissue-specific effects (within this subset of tissues). The strongest 164 support of tissue-specificity would be given by QTL pairs whose effects are uncorrelated. 165 For example, the solute carrier family 44, member 3 (Slc44a3 ) gene has local-eQTL effects that are correlated in lung and kidney but anticorrelated liver (Fig 3B), 167 suggesting the liver eQTL could be transgressive [62] relative to the eQTL in lung and 168 kidney, whereby the effects of the haplotypes are reversed. For Slc44a3, CAST, PWK, 169 and WSB haplotypes result in higher expression in liver but lower expression in lung 170 and kidney. The local-eQTL for Slc44a3 were more similar in location in lung and 171 kidney whereas the liver eQTL was more distal to the gene TSS. Overall, the expression 172 data, estimated effects, and patterns of association are consistent with lung and kidney 173 sharing a causal local-eQTL that is distinct from the one in liver.

174
The CC founder strains all possess contributions from three mouse subspecies of M. 175 musculus: domesticus (dom), castaneus, (cast), and musculus (mus) [63]. Allele-specific 176 gene expression in mice descended from the CC founders often follow patterns that 177 matched the subspecies inheritance at the gene regions [64,65]. We found that 178 phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 gamma (Pik3c2g), a 179 gene of interest for diabetes-related traits [66], had tissue-specific local-eQTL in all 180 three tissues that closely matched the subspecies inheritance at their specific genomic 181 coordinates. The local-eQTL in all three tissues were all pair-wise uncorrelated (Fig 4). 182 Further, expression of Pik3c2g varied at statistically significant levels for liver versus 183 lung (q = 3.12 × 10 −13 ) and lung versus kidney (q = 3.55 × 10 −6 ). In lung, the CAST 184 haplotype (cast subspecies lineage) resulted in higher expression, consistent with a cast 185 versus dom/mus allelic series. In kidney, the B6, 129, NZO, and WSB haplotypes (dom 186 subspecies) resulted in higher expression, whereas A/J and NOD (mus), CAST (cast), 187 and PWK (dom) haplotypes showed almost no expression, mostly consistent with a dom 188 versus cast/mus allelic series. The PWK founder appears inconsistent, but we note that 189 the QTL peak was in a small dom haplotype block interspersed within a broader mus 190 region. The expression level in kidney of Pik3c2g in PWK was low, similar to A/J and 191 NOD, suggesting that the causal variant may be located in the nearby region, where all 192 three have mus inheritance and, notably, where the peak variant association occurred. 193 In liver, the B6 haplotype resulted in lower expression compared with the other 194 haplotypes. Interestingly, the liver eQTL was in a region that contained a 195 recombination event between dom and mus, only present in B6, which may explain the 196 unique expression pattern. were detected in all three tissues in the 3Mb region surrounding its TSS. The genomes of the CC founders can be simplified in terms of contributions from three subspecies lineages of M. musculus: dom (blue), cast (yellow), and mus (red). The effects of each local-eQTL matched the subspecies contributions near the eQTL coordinates, with low expressing subspecies alleles colored teal and high expressing alleles colored salmon, consistent with local-eQTL for Pik3c2g being distinct and tissue-specific. The gene expression data are represented as interquartile range bars, categorized based on most likely diplotype at the eQTL for each CC strain. Haplotype and variants associations are included for each tissue, with the red tick representing the Pik3c2g TSS, black ticks representing variant association peak, and colored ticks representing the haplotype association peak. The most rigorous procedure to detect each QTL (Analysis L, C, or G) is reported. Haplotype effects, estimated as constrained BLUPs, were consistent with the expression data, and uncorrelated across the tissues. haplotype effects of the distal-eQTL were all highly correlated, with A/J, B6, 129, and 205 CAST haplotypes corresponding to higher Akr1e1 expression. Across tissues, expression 206 was significantly higher in liver and kidney compared with lung (q = 1.63 × 10 −19 and 207 q = 4.40 × 10 −34 , respectively). Another example is the Period circadian clock 2 (Per2 ; 208 Fig 5B) gene, which possessed intra-chromosomal distal-eQTL approximately 100Mb 209 away from the TSS that were detected in all three tissues at a lenient chromosome-wide 210 significance (Analysis C; FDR ≤ 0.2). The haplotype effects were significantly 211 correlated among the tissues, characterized by high expression with the NOD and PWK 212 haplotypes present. Together, these findings provide strong validation for the 213 distal-eQTL, which would commonly not be detected based on tissue-stratified analyses. 214 Other unique patterns of distal genetic regulation were detected, such as for ring finger 215 protein 13 gene (Rnf13 ) with distal-eQTL that varied across tissues, described in S13 Per2 has intra-chromosomal distal-eQTL leniently detected 100Mb away from the TSS, also with highly correlated haplotype effects across the tissues, providing further support that the distal-eQTL are not false positives. Expression data are represented as interquartile ranges for most likely diplotype, with differential expression noted when significant. Haplotype associations for each tissue distal-eQTL combination are shown, with the most rigorous statistical procedure for detection reported (Analysis L, C, or G). Red ticks signify the gene TSS and black ticks represent that eQTL peak. Fit haplotype effects, estimated as BLUPs, are included, along with the pairwise correlations of the eQTL.

218
An advantage of measuring gene expression and chromatin accessibility in the same mice 219 and tissues is the subsequent ability to examine the relationships between genotype, 220 chromatin accessibility, and gene expression using integrative methods such as mediation 221 analysis. We considered two possible models for the mediation of eQTL effects and 222 assessed the evidence for each in our data. In the first model, proximal chromatin state 223 acts as a mediator of local-eQTL ( Fig 6A). That is, the local-eQTL for a gene affects 224 that gene's expression by, at least in part, altering local chromatin accessibility. To test 225 for this, we used an approach adapted from studies in the DO [22] and applied it to 226 local-eQTL detected through Analysis L (see S3 Appendix for greater detail). Across the three tissues, between 13-42 local-eQTL showed evidence of mediation 229 through proximal accessible chromatin regions at genome-wide significance, and 35-106 230 at chromosome-wide significance (S6 Table). The coiled-coil domain containing 137 gene 231 (Ccdc137 ) is a strong example of this type of mediation. Fig 7 shows the genome scans 232 that identify both the local-eQTL for Ccdc137 and the cQTL near it. The significance 233 of the eQTL was sharply reduced when we conditioned on chromatin accessibility, and 234 the significance of the cQTL was stronger than the eQTL, as expected by the proposed 235 mediator model. Establishing mediation requires more than mere co-localization of eQTL and cQTL. 237 For example, local-eQTL and co-localizing cQTL were identified for both haloacid 238 dehalogenase-like hydrolase domain containing 3 (Hdhd3 ) in liver (S14 FigA) and 239 acyl-Coenzyme A binding domain containing 4 (Acbd4 ) in kidney (S14 FigB).

240
Comparing the statistical associations at the loci for eQTL and cQTL, and the 241 corresponding haplotype effects, however, showed better correspondence in the case of 242 Hdhd3 than for Acbd4 : a strong mediation signal was detected for Hdhd3, indicated by 243 the decreased eQTL association when conditioning on the chromatin state, whereas  Mediation was also tested for distal-eQTL, detected through Analysis G, by evaluating 248 the expression of nearby genes as candidate mediators [59], as shown in Fig 6B. 249 Zinc finger protein intermediates detected for Ccnyl1 250 Eight genes were identified with mediated distal-eQTL (S7 Table). For example, the  The QTL and mediation results for all three tissues are depicted as circos plots [67] in 258 corresponds to the immune-related major histocompatibility (MHC) region in mouse.

262
Most of the chromatin-mediated genes in this region, however, are not histocompatibility 263 genes (Files S28 -S30). Patterns of distal-QTL and gene mediation vary across the 264 three tissues, though consistent regions are also observed, such as the Idd9 region on 265 chromosome 4 described in [68], which contains multiple zinc finger proteins (ZFPs) and 266 regulates genes such as Ccnyl1 and Akr1e1, described in greater detail below.
267 Fig 8. Summaries of QTL and mediation analyses. Circos plots of eQTL (yellow), cQTL (blue), and mediation (red) in lung, liver, and kidney. The two outer rings of dots represent local-eQTL and local-cQTL detected by Analysis L at chromosome-wide significance, with red lines between connecting genes and chromatin sites for which chromatin mediation was detected. The inner circle contains connections representing distal-eQTL, distal-cQTL, and gene-gene mediation from Analysis G. Thick lines represent QTL and mediators with permutation-based p-value (permP) < 1 × 10 −5 .
The detected signals were primarily local, which also tended to be stronger than the observed distal signals. Fewer QTL and mediators were detected in liver tissue. Genes highlighted within the text, such as Akr1e1, are indicated at their genomic coordinates.
Genetic regulation of Akr1e1 expression by a zinc finger protein and  with these studies, CC strains with NOD inheritance at the distal-eQTL had low 289 expression of Akr1e1, observed in all tissues. Genetic variation from B10 is not present 290 in the CC, but the closely related B6 founder had high expression of Akr1e1 like B10. 291 NZO, PWK, and WSB haplotypes resulted in low Akr1e1 expression, similar to NOD, 292 while A/J, 129, and CAST haplotypes joined B6 as driving high expression (Fig 9).

293
Variable expression of Zfp985 is a strong candidate for driving these effects on Akr1e1. 294 Additional sources of genetic regulation of Akr1e1 were observed for kidney, where a 295 strong distal-cQTL for an accessible chromatin site corresponding to the promoter of  Mediation model for Akr1e1 distal-eQTL. The genetic regulation of Akr1e1 expression is reconstructed based on relationships observed across the three tissues. Distal-eQTL were detected in all tissues at similar levels of significance. A local-eQTL for Zfp985 that is proximal to the Akr1e1 distal-eQTL was observed in lung, and Zfp985 expression was detected as an anti-correlated mediator of the distal-eQTL, consistent with ZFP985 suppressing Akr1e1 expression. The chromatin site proximal to the Akr1e1 TSS has a distal-cQTL detected in kidney. Chromatin accessibility at the site was found to be a significant mediator of Akr1e1 expression. Combining associations across tissues supports a biological model whereby ZFP985, whose gene is expressed in mice with NOD, NZO, PWK, and WSB haplotypes, silences Akr1e1 through KRAB domain-induced chromatin remodeling. QTL and mediation genome scans are included, along with sequence phenotypes as interquartile ranges categorized according to most likely diplotype, and modeled haplotype effects fit as BLUPs. The relative magnitudes of the QTL effect sizes and mediation scores are consistent with the proposed model, with Zfp985 local-eQTL > distal-cQTL > Akr1e1 distal-eQTL, and chromatin mediation > mediation through Zfp985 expression. In this study we performed QTL and mediation analyses of gene expression and 305 chromatin accessibility data in liver, lung, and kidney tissue samples from 47 strains of 306 the CC. We examined correlations between haplotype effects of co-localizing QTL to 307 identify QTL that are likely functionally active in multiple tissues as well as QTL with 308 distinct activity across the three tissues, as is the case with the Pik3c2g gene, 309 potentially representing differing active genetic variants in the local region. We detected 310 extensive evidence of chromatin mediation of local-eQTL as well as gene expression 311 mediators underlying distal-eQTL. One unique example is the elucidation of the genetic 312 regulation of Akr1e1 expression, a gene that plays a role in glycogen metabolism, 313 involving inhibition by expression of a distal zinc finger protein mediator that 314 contributes to reduced chromatin accessibility at the promoter of Akr1e1. These 315 findings highlight the ability of integrative QTL approaches such as mediation analysis 316 to identify interesting biological findings, including the ability to identify functional 317 candidates for further downstream analysis.

318
Fewer detected cQTL than eQTL 319 We found that the effect sizes of cQTL are on average lower than eQTL (see S19 Fig), 320 as has been previously reported [22]. The reduced number of cQTL compared with 321 eQTL is likely due to both technical and biological reasons. Technically, the RNA-seq 322 assay measures a distinct class of molecules (mRNAs) that can be accurately extracted 323 from cells with the resulting sequence reads mapped to the transcriptome, which 324 encompasses < 5% of the genome. In contrast, the transposon incorporation event 325 central to ATAC-seq is enriched in, but not solely limited to, accessible chromatin.

326
Unlike the transcriptome, accessible chromatin can occur anywhere in the genome and 327 regions are defined empirically by the data. Thus, the signal from the assay is more 328 variable and noisy across samples, which impedes our ability to detect cQTL. 329 Biologically, if a variant affects the activity of a regulatory element that alters 330 expression levels, it is expected that the mRNA levels in the sample will reflect this. By 331 contrast, even if a variant affects chromatin accessibility, it may do so in a manner that 332 is difficult to detect above the background noise of the assay. In recent studies of gene 333 expression and chromatin accessibility in the adult brain from same individuals, 334 2,154,331 cis-eQTL were found for 467 individuals [71], whereas only 6,200 cQTL were 335 detected, albeit for only 272 individuals [72]. While the reduced number of individuals 336 undoubtedly contributed to this lower number of detected cQTL, it is likely that 337 significantly fewer cQTL would be found in the same number of individuals.

338
Reduced cQTL and mediation signal in liver compared with 339 lung and kidney 340 Detecting QTL and mediation depends not only on sample size but also on biological 341 and technical factors that are difficult to quantify across the tissues. Although it is 342 possible that liver has fewer actual cQTL (and thus fewer chromatin mediators) than 343 lung and kidney, it is also possible that the signal quality of the ATAC-seq is lower for 344 this tissue, resulting in fewer detections due to increased technical noise. True biological 345 differences in the number of cQTL and mediator usage among the tissues would likely 346 reflect multi-level regulatory programs specific to each tissue, a complex subject 347 requiring more targeted experiments than used here.

348
Joint QTL analysis in multiple tissues 349 Multi-tissue QTL analyses are increasingly used in both humans, such as within the 350 GTEx project (e.g., [73,74]), and in mice (e.g., [75,76]). We believe our use of formal 351 statistical tests of the correlation coefficient between the haplotype effects of overlapping 352 QTL is novel in defining co-localizing QTL across tissues. This method allows us to 353 identify loci likely representing tissue-specific QTL with unique haplotype effects 354 patterns, as demonstrated for the gene Pik3c2g (Fig 4). This approach can also detect 355 consistent haplotype effects, as with the gene Per2 (Fig 5B), which possessed only 356 marginally significant distal-eQTL in the three tissues, suggesting jointly mapping QTL 357 across all tissues increases power. Formal joint analysis approaches have been proposed, 358 largely implemented for detecting SNP associations, including meta-analysis on 359 summary statistics (e.g., [76,77]) and fully joint analysis, including Bayesian hierarchical 360 models [78] and mixed models [79]. Extending such methods to haplotype-based 361 analysis in MPPs poses some challenges, including how to best generalize methods to 362 more complex genetic models and for the CC with a limited number of unique genomes. 363 Nonetheless, when multiple levels of molecular traits are measured, joint analyses could 364 conceivably be incorporated into the mediation framework to improve detection power. 365 Correlated haplotype effects suggest subtle multi-allelic QTL haplotype effects for QTL pairs of certain genes, such as Cox7c (Fig 3A) suggests that 373 these QTL are at least subtly multi-allelic. Correlated haplotype effects are consistent 374 with the genetic regulation, even local to the gene TSS, being potentially complex, likely 375 due to founder-specific modifiers.
Correlated haplotype effects with differential expression across 377 tissues 378 We detected numerous eQTL pairs with correlated haplotype effects across tissues but 379 with significantly different magnitudes of overall expression-for example, Cox7c, Ubc, 380 Slc44a3, and Akr1e1. We propose two potential explanations for this unique 381 co-occurrence. First, whole tissues differ in their cellular composition. It may be that 382 these genes are primarily expressed in a common cell type whose proportion varies 383 between the different tissue types, and that the observed differential expression reflects 384 this compositional variation. This hypothesis could be tested by follow-up single cell gene, which these procedures will not detect. Additionally, the causal mediator may not 408 be observed in the data, allowing for other candidates, correlated with the missing 409 causal element, to be incorrectly identified as mediators.

410
Despite these limitations, the mediation analysis used here provides specific causal 411 candidates for local-eQTL (mediated through chromatin) and distal-eQTL (mediated 412 through nearby genes). For example, we show strong evidence for ZFP985 mediating 413 the genetic regulation of Akr1e1 (Fig 9). Additional evidence suggests this is done by 414 ZFP985 contributing to reduced Akr1e1 promoter activity. It is possible that Zfp985 415 expression is simply strongly correlated with the true mediator, and others have 416 alternatively proposed Rex2 expression as a candidate [68], which we did not consider 417 due to low expression in all three tissues. Regardless of the identity of the true 418 mediator, our analysis shows strong evidence that it acts causally by reducing 419 chromatin accessibility near Akr1e1. evaluation of QTL mapping power in the CC using simulation [46], this study had 426 approximately 80% power to detect genome-wide QTL with a 55% effect size or greater. 427 Effect size estimates for genome-wide QTL (green dots; S19 Fig) are consistent with 428 this expectation, albeit potentially inflated due to the Beavis effect [82], and provide 429 interpretable point summaries for haplotype-based QTL mapping, analogous to minor 430 allele effect estimates in SNP-based studies. We calculated estimates of effect size in 431 two ways, one based on a fixed effect fitting of the QTL term and the other as a 432 random term [51]. Notably, a small number of the distal-eQTL had low random effect 433 size estimates (S20 Fig) compared with their fixed effects-based estimates, likely the 434 result of outliers with lowly-observed founder inheritance (e.g. rare allele) at the 435 putative QTL. Alternatively, a residual variation estimate, i.e. RSS, could be calculated 436 from the shrunken haplotype effects to identify likely false positives, but not be as 437 aggressively reduced as the variance component-based estimates, representing a middle 438 ground approach that was found to be effective [54]. We primarily reported fixed effects 439 estimates due to their consistency with reported expectations [46] for a study of this  Adult male mice (8-12 weeks old) from 47 CC strains were acquired from the University 453 of North Carolina Systems Genetics Core (listed in S1 Appendix) and maintained on an 454 NTP 2000 wafer diet (Zeigler Brothers, Inc., Gardners, PA) and water ad libitum. The 455 housing room was maintained on a 12-h light-dark cycle. Our experimental design 456 sought to maximize the number of strains relative to within-strain replications based on 457 the power analysis for QTL mapping in mouse populations [25]; therefore, one mouse 458 was used per strain. Prior to sacrifice, mice were anesthetized with 100 mg/kg nembutal 459 though intraperitoneal injection. Lungs, liver and kidney tissues were collected, flash

499
Sequence trait filtering for QTL analysis 500 Trimmed mean of M-values (TMM) normalization (edgeR; [30]) was applied to TPM 501 values from read counts of genes and chromatin windows, respectively. Genes with 502 TMM-normalized TPM values ≤ 1 and chromatin windows with normalized counts ≤ 5 503 for ≥ 50% of samples were excluded (as in [22]) in order to avoid the detection of QTL 504 that result from highly influential non-zero observations when most of the sample have 505 low to no expression. For each gene and chromatin window, we applied K-means 506 clustering with K = 2 to identify outcomes containing outlier observations that could where these specific animals likely differ [33]. To reduce the number of statistical tests, 517 adjacent genomic regions were merged through averaging if the founder mosaics for all 518 mice were similar, defined as L2 distance ≤ 10% of the maximum L2 distance ( √ 2 for a 519 probability vector). This reduced the number of tested loci from 77,592 to 14,191. 520 Differential expression and accessibility analyses 521 Read counts for each sample were converted to counts per million (CPM), followed by 522 TMM normalization (edgeR). For chromatin accessibility, windows in which > 70 % of 523 samples had a CPM ≤ 1 were removed, requiring that samples from at least two of the 524 three tissues to have non-zero measurements in order to be considered for differential 525 analysis between tissues. Genes and chromatin windows with no or low counts across 526 sample libraries provide little evidence for detection of differential signal, thus removing 527 them reduces the multiple testing burden. Differentially expressed genes and accessible 528 chromatin windows were determined using limma [34], which fit a linear model of the 529 TMM-normalized CPM value as the response and fixed effect covariates of strain, batch, 530 and tissue (lung, liver, or kidney). To account for mean-variance relationships in gene 531 expression and chromatin accessibility data, precision weights were calculated using the 532 limma function voom and incorporated into the linear modeling procedure. The p-values 533 were adjusted using a false discovery rate (FDR) procedure [35], and differentially 534 expressed genes and accessible chromatin windows were called based on the q-value ≤ 535 0.01 and log 2 fold-change ≥ 1. Adjacent significantly differential chromatin windows in 536 the same direction were merged with a p-value computed using Simes' method [36], and 537 chromatin regions were re-evaluated for significance using the Simes p-values.

538
Gene set association analysis 539 Biological pathways enriched with differentially expressed genes or accessible chromatin 540 were identified with GSAASeqSP [37] with Reactome Pathway Database annotations 541 (July 24, 2015 release). A list of assayed genes were input to GSAASeqSP along with a 542 weight for each gene g, calculated as: where sign(∆ g ) is the sign of the fold change in gene g expression, and q g is the 544 FDR-adjusted differential expression p-value. Pathways with gene sets of cardinality < 545 15 or > 500 were excluded.
where y i is the trait level for individual i, µ is the intercept, batch b is a categorical fixed 561 effect covariate with five levels b = 1, . . . , 5 representing five sequencing batches for both 562 gene expression and chromatin accessibility and where b[i] denotes the batch relevant to 563 i, ε i ∼ N(0, σ 2 ) is the residual noise, and QTL i models the genetic effect at the locus, 564 namely that of the eQTL for expression or the cQTL for chromatin accessibility.

565
Specifically, the QTL term models the (additive) effects of alternate haplotype states 566 and is defined as haplotype dosages (i.e., the posterior expected count, from 0 to 2, inferred by the 568 haplotype reconstruction) for the eight founder haplotypes, and β = (β AJ , . . . , β WSB ) T 569 is a corresponding vector of fixed effects. Note that in fitting this term as a fixed effects 570 vector, the linear dependency among the dosages in x i results in at least one haplotype 571 effect being omitted to achieve identifiability; estimation of effects for all eight founders 572 is performed using a modification described below. Prior to model fitting, to avoid 573 sensitivity to non-normality and strong outliers, the response {y i } n i=1 was subject to a 574 rank inverse normal transformation (RINT). The fit of Eq 2 was compared with the fit 575 of the same model omitting the QTL term (the null model) by an F-test, leading to a 576 nominal p-value, reported as the logP = − log 10 (p-value).

577
QTL detection: local (L), chromosome-wide (C), and 578 genome-wide (G) 579 QTL detections were declared according to three distinct protocols of varying stringency 580 and emphasis. The first protocol, termed Analysis L, was concerned only with detection 581 of "local QTL", that is, QTL located at or close to the relevant expressed gene or 582 accessible chromatin region. Here we define "local" as ± 10Mb, as been done previously 583 in studies using DO mice [22]. Our intent is to capture QTL that likely act in cis on 584 gene expression and chromatin accessibility, which are expected to have strong effects, 585 while also recognizing the limitations of using haplotype blocks with median size of 16.3 586 Mb [19] and a small sample size. Fig S6 Fig suggest that 10Mb generally captures the 587 strongest QTL signals near the gene TSS and chromatin window midpoint. The second 588 protocol, Analysis C, broadened the search for QTL to anywhere on the chromosome on 589 which the trait is located; the greater number of loci considered meant that the criterion 590 used to call detected QTL for this protocol was more stringent than Analysis L. The 591 third protocol, Analysis G, further broadened the search to the entire genome with the 592 most stringent detection criterion. These protocols used two types of multiple test 593 correction: a permutation-based control of the family-wise error rate (FWER) and the 594 Benjamini-Hochberg False Discovery Rate (FDR; [35]). Below we describe in detail the 595 permutation procedure and then the three protocols.

596
Permutation-based error rate control: chromosome-and genome-wide 597 The FWER was controlled based on permutations specific to each trait. The sample 598 index was permuted 1,000 times and recorded, and then genome scans performed for 599 each trait, using the same permutation orderings. Given a trait, the maximum logP 600 from either the entire genome or the chromosome for each permutation was collected 601 and used to fit a null generalized extreme value distribution (GEV) in order to control 602 the genome-and chromosome-wide error rates, respectively [48]. Error rates were 603 controlled by calculating a p-value for each QTL based on the respective cumulative 604 density functions from the GEVs: permP = 1 − F GEV (logP), where F GEV is the 605 cumulative density function of the GEV. We denote genome-and chromosome-wide 606 error rate controlling p-values as permP G and permP C respectively.

607
The appropriateness of permutation-derived thresholds [49] relies on the CC strains 608 being equally related, thus possessing little population structure and being exchangeable. 609 This assumption was supported by simulations of the funnel breeding design [41]. More 610 recent simulations based on the observed CC strain genomes [46] found 611 non-exchangeable population structure for highly polygenic genetic architectures, albeit 612 at low levels. Nevertheless, we use permutations given that molecular traits often have 613 strong effect QTL that are detectable in the presence of subtle population structure.  Chromosome-wide analysis involved examining QTL associations at all loci on the 623 chromosome harboring the gene or chromatin region in question. The peak logP is input 624 into a chromosome-wide GEV, producing a permP C , which are further subjected to 625 adjustment by FDR to account for multiple testing across the traits [50]. Compared 626 with Analysis L, this procedure is more stringent with respect to local-QTL because of 627 the additional FDR adjustment. In addition, it can detect distal-QTL outside the local 628 region of the trait. Note, since only the most significant QTL is recorded, this procedure 629 will disregard local-QTL if a stronger distal-QTL on the chromosome is observed. The genome-wide analysis is largely equivalent to Analysis C but examines all genomic 632 loci, while an FDR adjustment is made to the genome-wide permP G . Unlike Analysis C, 633 it incorporates additional scans conditioned on detected QTL to potentially identify 634 multiple QTL per trait (i.e. both local-and distal-QTL). Briefly, after a QTL is 635 detected, a subsequent scan (and permutations) is performed in which the previously 636 detected QTL is included in both the null and alternative models, allowing for 637 additional independent QTL to be detected with FDR control. See S2 Appendix for 638 greater detail on this conditional scan procedure and a clear example in Fig S21 Fig.   639 Notably, the local/distal status of the QTL does not factor into Analysis G.

640
Analyses L, C, and G should detect many of the same QTL, specifically strong 641 local-QTL. Collectively, they allow for efficient detection of QTL with varying degrees 642 of statistical support while strongly leveraging local status.

644
The effect size of a detected QTL was defined as the R 2 attributable to the QTL term; 645 specifically, as 1 − RSS QTL /RSS 0 , where RSS = n i=1 (y i − y i ) 2 is the residual sum of 646 squares, i.e. the sum of squares around predicted value y i , and RSS QTL and RSS 0 647 denote the RSS calculated for the QTL and null models respectively. We also calculated 648 a more conservative QTL effect size estimate with a QTL random effects model to  The fixed effects QTL model used for mapping, though powerful for detecting 652 associations, is sub-optimal for providing stable estimates of the haplotype effects 653 vector, β (calculated as (X T X) −1 X T y, where X is the full design matrix). This is 654 because, among other things, 1) the matrix of haplotype dosages that forms the design 655 matrix of {QTL i } n i=1 in Eq 2, is multi-collinear, which leads to instability, and 2) 656 because the number of observations for some haplotypes will often be few, leading to 657 high estimator variance [24]. More stable estimates were therefore obtained using 658 shrinkage. At detected QTL, the model was refit with β modeled as a random effect, Variant association has been used previously in MPP (e.g., [53,54]), and uses the same 682 underlying model as the haplotype-based mapping (Eq 2), with the QTL term now 683 representing imputed dosages of the minor allele. Variant association can more 684 powerfully detect QTL than haplotype mapping if the simpler variant model is closer to 685 the underlying biological mechanism [55], though it will struggle to detect multi-allelic 686 QTL.

687
Variant genotypes from mm10 (NCBI38) were obtained using the ISVdb [56] for the 688 CC strains, which were converted to mm9 coordinates with the liftOver tool [57]. Mediation analysis has recently been used with genomic data, including in humans 698 (e.g., [11,15]) and rodents (e.g., [54,58]), to identify and refine potential intermediates of 699 causal paths underlying phenotypes. We use a similar genome-wide mediation analysis 700 as used with the DO [22,59] to detect mediators of eQTL effects on gene expression.

701
In our study, mediation describes when an eQTL (X) appears to act on its target 702 (Y ) in whole or in part through a third variable (M ), the mediator, which in this case is 703 a molecular trait. The molecular traits considered here are: a) chromatin accessibility, 704 in which case X serves also as a cQTL ( Fig 6A); and, in a separate analysis, b) the 705 expression of a second, distal, gene, in which case X serves also as a distal-eQTL (Fig 706  6B).

707
Traditional mediation analysis [3] tests whether the data, for predefined X, Y and 708 M , are consistent with mediation, doing so in four steps. Steps (1) and (2)  Step (4) distinguishes "full mediation", where mediation 714 explains the association between X and Y entirely, i.e., the QTL acts entirely through 715 the mediator, from "partial mediation", where the QTL acts partly through the 716 mediator and partly directly (or through other unmodeled routes).

717
In our study, we use an empirical approximation of the above adapted to 718 genome-wide data, building on mediation analysis used in studies of DO mice [22,23,59]. 719 In outline: For a given eQTL, i.e., a QTL for which step (1) (X → Y ) has been 720 established, we performed a genome-wide mediation scan. This mediation scan 721 consisted of testing step (4) (X → Y |M ), that is, testing whether the eQTL association 722 was significantly reduced when adding the mediator as a covariate, for a large number of 723 "potential" mediators, namely all chromatin regions (or transcripts) genome-wide. Note 724 that most of these potential mediators would be formally ineligible under a traditional 725 analysis (i.e., X → M would not hold) but here helped to define a background (null) 726 level of association. The results of the mediation scan were then filtered to include only 727 results satisfying both of the following criteria: first, the mediator must posess a 728 co-localizing QTL, i.e., a QTL for which step (2) (X → M ) does hold; second, the 729 association between QTL and mediator must be stronger than that between QTL and 730 transcript (X → M > X → M ), this approximating step (3) (M → Y |X). We did not 731 attempt to distinguish between partial and full mediation since this distinction was too 732 easily obscured by noise.

733
The genome-wide mediation scan procedure in more detail was as follows. Consider 734 an eQTL that our previous genome scan has already shown to affect the expression of 735 gene j. Denote the expression level for j in individual i as y (these respectively correspond to Y and X in the mediation description above). 737 Further, consider a proposed mediator k ∈ K, where K is the set of all eligible chromatin 738 accessible sites or expressed genes, and let m ik be the value of that mediator for 739 individual i. The mediation scan for a given gene/eQTL pair j proceeds by performing, 740 for each proposed mediator k ∈ K, a model comparison between the alternative model, 741 and the null model, where other terms are as described for Eq 2. (Not, shown are additional covariates, such 743 as conditioned loci, which would be included in both the null and alternative models.) 744 The above model comparison can be seen as re-evaluating the significance of a given 745 eQTL association by conditioning on each proposed mediator in turn (i.e., testing 746 X → Y |M for each M ). The resulting mediation scan, in contrast to the 747 earlier-described genome scans, thus fixes the QTL location while testing across the 748 genome for candidate mediators.

749
In general, assuming most proposed mediators are null, the mediated logP should 750 fluctate around the original eQTL logP, since the model comparison will resemble the 751 original test of that locus in the genome scan. For mediators that possess some or all of 752 the information present in the eQTL, however, the mediated logP will drop relative to 753 the original logP, reflecting X → Y |M being less significant than X → Y . Empirical 754 significance of genome-wide mediation has previously been determined by comparing 755 the nominal mediator scores to the distribution of mediation scores genome-wide, the 756 latter effectively acting as a null distribution [22,23,59]. We instead determine a null 757 distribution explicitly by permutation, characterizing the distribution of the minimum 758 logP (as opposed to maximum logP for QTL scans) to estimate significance and set 759 false positive control (FWER). As with the QTL mapping procedures, we performed 760 mediation scans on 1,000 permutations, permuting the mediators rather than the 761 outcome, to characterize GEVs from the minimum logP, that is, 762 permP m = F GEV (logP), at both chromosome-and genome-wide levels.

763
Detection of mediation is dependent on a number of assumptions about the 764 underlying variables, their relationships, and those relationships' directionality [4].

765
Many of these cannot be controlled in a system as complex as chromatin accessibility All statistical analyses were conducted with the R statistical programming language [60]. 785 The R package miQTL was used for all the mapping and mediation analyses, and is 786 available on GitHub at https://github.com/gkeele/miqtl.