Cell Specific eQTL Analysis without Sorting Cells

The functional consequences of trait associated SNPs are often investigated using expression quantitative trait locus (eQTL) mapping. While trait-associated variants may operate in a cell-type specific manner, eQTL datasets for such cell-types may not always be available. We performed a genome-environment interaction (GxE) meta-analysis on data from 5,683 samples to infer the cell type specificity of whole blood cis-eQTLs. We demonstrate that this method is able to predict neutrophil and lymphocyte specific cis-eQTLs and replicate these predictions in independent cell-type specific datasets. Finally, we show that SNPs associated with Crohn’s disease preferentially affect gene expression within neutrophils, including the archetypal NOD2 locus.

The SHIP authors thank Mario Stanke for the opportunity to use his Server Cluster for the SNP imputation. EGCUT: EGCUT received financing by FP7 grants (201413, 245536), also received targeted financing from the Estonian Government (SF0180142s08) and direct funding from the Ministries of Research and Science and Social Affairs. EGCUT studies are funded by the University of Tartu in the framework of the Center of Translational Genomics and by the European Union through the European Regional Development Fund,

Introduction
In the past seven years, genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with human disease [1]. The realization that many of the disease-predisposing variants are non-coding and that single nucleotide polymorphisms (SNPs) often affect the expression of nearby genes (i.e. cis-expression quantitative trait loci; cis-eQTLs) [2] suggests these variants have a predominantly regulatory function. Recent studies have shown that disease-predisposing variants in humans often exert their regulatory effect on gene expression in a cell-type dependent manner [3][4][5]. However, most human eQTL studies have used sample data obtained from mixtures of cell types (e.g. whole blood) or a few specific cell types (e.g. lymphoblastoid cell lines) due to the prohibitive costs and labor required to purify subsets of cells from large samples (by cell sorting or laser capture micro-dissection). In addition, the method of cell isolation can trigger uncontrolled processes in the cell, which can cause biases. In consequence, it has been difficult to identify in which cell types most diseaseassociated variants exert their effect.
Here we describe a generic approach that uses eQTL data in mixtures of cell types to infer cell-type specific eQTLs (Fig 1). Our strategy includes: (i) collecting gene expression data from an entire tissue; (ii) predicting the abundance of its constituent cell types (i.e. the cell counts), by using expression levels of genes that serve as proxies for these different cell types (since not all datasets might have actual constituent cell count measurements). We used an approach similar to existing expression and methylation deconvolution methods [6][7][8][9][10][11]; (iii) run an association analysis with a term for interaction between the SNP and the proxy for cell count to detect cell-type-mediated or-specific associations, and (iv) test whether known disease associations are enriched for SNPs that show the cell-type-mediated or-specific effects on gene expression (i.e. eQTLs).
For the purpose of illustrating our cell-type specific analysis strategy in the seven whole blood cohorts, we focused on neutrophils. Direct neutrophil cell counts and percentages were only available in the EGCUT and SHIP-TREND cohorts, requiring us to infer neutrophil percentages for the other five cohorts. We used the EGCUT cohort as a training dataset to identify a list of 58 Illumina HT12v3 probes that correlated positively with neutrophil percentage (Spearman's correlation coefficient R > 0.57; S1 Fig, S1 Table). We observed that 95% of these genes show much higher expression in purified neutrophils, as compared to 13 other purified blood cell-types, based on RNA-seq data from the BLUEPRINT epigenome project [21] (S2 Fig). We then summarized the gene expression levels of these 58 individual probes into a single neutrophil percentage estimate, by applying principal component analysis (PCA) and determining the first principal component (PC). We then used this first PC as a proxy for neutrophil percentage, an approach that is similar to existing expression and methylation deconvolution methods [6][7][8][9][10][11]. In the EGCUT dataset, the actual neutrophil percentage showed some correlation with age (Pearson R = 0.08, P = 0.02), but no association with gender (Student's T-test P = 0. 31 behavior: some correlation with age (Pearson R = 0.14, P = 6 x 10 -5 ) and no association with gender (P = 0.11). This predicted neutrophil percentage strongly correlated with the actual neutrophil percentage (Spearman R = 0.75, Pearson R = 0.76; Fig 2). Including more or fewer top probes than the top 58 probes resulted in similar correlations (S4 Fig). We then used this set of 58 probes in each of the other cohorts as well, and used PCA per cohort on the probe correlation matrix of these 58 probes. In the SHIP-TREND cohort, for which the actual neutrophil percentage was available as well, we observed that the inferred neutrophil proxy strongly correlated with the actual neutrophil percentage (Spearman R = 0.81, Pearson R = 0.82; Fig 2).
Here we limited our analysis to 13,124 cis-eQTLs that were previously discovered in a whole blood eQTL meta-analysis of a comparable sample size [22] (we note that these 13,124 cis-eQTLs were detected while assuming a generic effect across cell-types, and as such, genomewide application of cell-type specificity strategy might result in the detection of additional celltype-specific cis-eQTLs). To infer the cell-type specificity of each of these eQTLs, we performed the eQTL association analysis with a term for interaction between the SNP marker and the proxy for cell count within each cohort, followed by a meta-analysis of the interaction term (weighted for sample size) across all the cohorts. We identified 1,117 cis-eQTLs with a significant interaction effect (8.5% of all cis-eQTLs tested; false discovery rate (FDR) < 0.05; 1,037 unique SNPs and 836 unique probes; S2 Table and S3 Table). The power to detect these effects depends on the original cis-eQTL effect size, as we observed that the main effect of cis-eQTLs that have a significant interaction term generally explained more variance in gene expression in the previously published cis-eQTL meta-analysis [22], as compared to the cis-eQTLs that did not show a significant interaction: 71% of the cell-type specific eQTLs had a main effect that explained at least 3% of the total expression variation, whereas 25% of the cis-eQTLs that did not show a show significant interaction explained at least 3% of the total expression variation (S5 Fig). Out of the total number of cis-eQTLs tested, 909 (6.9%) had a positive direction of effect, which indicates that these cis-eQTLs show stronger effect sizes when more neutrophils are present (i.e. 'neutrophil-mediated cis-eQTLs'; 843 unique SNPs and 692 unique probes). Another 208 (1.6%) had a negative direction of effect (196 unique SNPs and 145 unique probes), indicating a stronger cis-eQTL effect size when more lymphoid cells are present (i.e. 'lymphocyte-mediated cis-eQTLs'; since lymphocyte percentages are strongly negatively correlated with neutrophil percentages; Fig 1). Overall, the directions of the significant interaction effects were consistent across the different cohorts, indicating that our findings are robust (S6 Fig).
We validated the neutrophil-and lymphoid-mediated cis-eQTLs in six small, but purified cell-type gene expression datasets that had not been used in our meta-analysis. We generated new eQTL data from two lymphoid cell types (CD4+ and CD8+ T-cells) and one myeloid cell type (neutrophils, see online methods) and used previously generated eQTL data on two lymphoid cell types (lymphoblastoid cell lines and B-cells) and another myeloid cell type (monocytes, S4 Table). As expected, compared to cis-eQTLs without a significant interaction term ('generic cis-eQTLs', n = 12,007) the 909 neutrophil-mediated cis-eQTLs did indeed show very strong cis-eQTL effects in the neutrophil dataset (Wilcoxon P = 2.5 x 10 -81 when compared to generic cis-eQTLs; Fig 3A). We observed that these cis-eQTLs also showed an increased effectsize in the monocyte dataset (Wilcoxon P-value = 4.9 x 10 -31 when compared to generic cis-eQTLs; Fig 3A). Because both neutrophils and monocytes are myeloid lineage cells, this suggests that some of the neutrophil-mediated cis-eQTL effects also show stronger effects in other cells of the myeloid lineage. Conversely, the 208 lymphoid-mediated cis-eQTLs had a pronounced effect in each of the lymphoid datasets (Wilcoxon P-value 7.8 x 10 -14 compared to generic cis-eQTLs; Fig 3A), while having small effect sizes in the myeloid datasets. These validation results indicate that our method is able to reliably predict whether a cis-eQTL is mediated by a specific cell type. Unfortunately, the cell type that mediates the cis-eQTL is not necessarily the one in which the cis-gene has the highest expression (Fig 3B), making it impossible to identify cell-type-specific eQTLs directly on the basis of expression levels.
Myeloid and lymphoid blood cell types provide crucial immunological functions. Therefore, we assessed five immune-related diseases for which genome-wide association studies previously identified at least 20 loci with a cis-eQTL in our meta-analysis. We observed a significant enrichment only for Crohn's disease (CD), (binomial test, one-tailed P = 0.002, S5 Table): out of 49 unique CD-associated SNPs showing a cis-eQTL effect, 11 (22%) were neutrophil-mediated. These 11 SNPs affect the expression of 14 unique genes (ordered by size of interaction effect: IL18RAP, CPEB4, RP11-514O12.4, RNASET2, NOD2, CISD1, LGALS9, AC034220.3, SLC22A4, HOTAIRM2, ZGPAT, LIME1, SLC2A4RG, and PLCL1). CD is a chronic inflammatory disease of the intestinal tract. While impaired T-cell responses and defects in antigen presenting cells have been implicated in the pathogenesis of CD, so far little attention has been paid to the role Method overview. I) Starting with a dataset that has cell count measurements, determine a set of probes that have a strong positive correlation to the cell count measurements. Calculate the correlation between these specific probes in the other datasets, and apply principal component analysis to combine them into a single proxy for the cell count measurement. II) Apply the prediction to other datasets lacking cell count measurements. III) Use the proxy as a covariate in a linear model with an interaction term in order to distinguish cell-type-mediated from non-cell-type-mediated eQTL effects.
doi:10.1371/journal.pgen.1005223.g001 eQTL Cell-Type Interaction in Whole Blood of neutrophils, because its role in the development and maintenance of intestinal inflammation is controversial: homeostatic regulation of the intestine is complex and both a depletion and an increase in neutrophils in the intestinal submucosal space can lead to inflammation. On the one hand, neutrophils are essential in killing microbes that translocate through the mucosal layer. The mucosal layer is affected in CD, but also in monogenic diseases with neutropenia and defects in phagocyte bacterial killing, such as chronic granulomatous disease, glycogen storage disease type I, and congenital neutropenia, leading to various CD phenotypes [23]. On the other hand, an increase in activated neutrophils that secrete pro-inflammatory chemokines and cytokines (including IL18RAP which has a neutrophil specific eQTL) maintains inflammatory responses. Pharmacological interventions for the treatment of CD have been developed to specifically target neutrophils and IL18RAP, including Sagramostim [24] and Natalizumab [25]. These results show clear neutrophil-mediated eQTL effects for various known CD genes, including the archetypal NOD2 gene. Although CD has previously been shown to have a slightly higher incidence in females [26,27], we did not find any relationship between NOD2 expression and gender or age (Student's T-test P = 0.08 and Pearson's correlation P = 0.39 respectively; S7 Fig). As such, our results provide deeper insight into the role of neutrophils in CD pathogenesis.
Large sample sizes are essential in order to find cell-type-mediated cis-eQTLs (Fig 4): when we repeat our study on fewer samples (ascertained by systematically excluding more cohorts from our study), the number of significant cell-type-mediated eQTLs decreased rapidly. This was particularly important for the lymphoid-mediated cis-eQTLs, because myeloid cells are approximately twice as abundant as lymphoid cells in whole blood. Consequently, detecting lymphoid-mediated cis-eQTLs is more challenging than detecting neutrophil-specific cis-eQTLs.
As whole blood eQTL data is easily collected, we were able to gather a sufficient sample size in order to detect cell-type-mediated or-specific associations without requiring the actual purification of cell types.  and in two datasets from the myeloid lineage (monocytes and neutrophils). Compared to generic cis-eQTLs, large effect sizes were observed for neutrophilmediated cis-eQTLs in myeloid lineage cell types, and small effect sizes in the lymphoid datasets. Conversely, lymphoid-mediated cis-eQTL effects had large effect sizes specifically in the lymphoid lineage datasets, while having smaller effect sizes in myeloid lineage datasets. These results indicate that our method is able to reliably predict whether a specific cis-eQTL is mediated by cell type. B) Comparison between average gene expression levels between different purified cell type eQTL datasets shows that neutrophil mediated cis-eQTLs have, on average a lower expression in cell types derived from the lymphoid lineage, and a high expression in myeloid cell types, while the opposite is true for lymphocyte mediated cis-eQTLs.

Discussion
Here we have shown that it is possible to infer in which blood cell-types cis-eQTLs are operating from a whole blood dataset. Cell-type proportions were predicted and subsequently used in a G x E interaction model. Hundreds of cis-eQTLs showed stronger effects in myeloid than lymphoid cell-types and vice versa.
These results were replicated in 6 individual purified cell-type eQTL datasets (two reflecting the myeloid and four reflecting the lymphoid lineage). This indicates our G x E analysis provides important additional biological insights for many SNPs that have previously been found to be associated with complex (molecular) traits.
Here, we concentrated on identifying cis-eQTLs that are preferentially operating in either myeloid or lymphoid cell-types. We did not attempt to assess this for specialized cell-types within the myeloid or lymphoid lineage. However, this is possible if cell-counts are available for these cell-types, or if these cell-counts can be predicted by using a proxy for those cellcounts. As such, identification of cell-type mediated eQTLs for previously unstudied cell-types is possible, without the need to generate new data. However, it should be noted that these individual cell-types typically have a rather low abundance within whole blood (e.g. natural killer cells only comprise~2% of all circulating white blood cells). As a consequence, in order to have sufficient statistical power to identify eQTLs that are mediated by these cell-types, very large whole blood eQTL sample-sizes are required and the cell type of interest should vary in abundance between individuals (analogous to the difference in the number of identified lymphoid Effect of sample size on power to detect cell type specific cis-eQTLs. We systematically excluded datasets from our meta-analysis in order to determine the effect of sample size on our ability to detect significant interaction effects. The number of significant interaction effects was rapidly reduced when the sample size was decreased (the number of unique significant probes given a Bonferroni corrected Pvalue < 8.1 x 10 -6 is shown). In general, due to their low abundance in whole blood, lymphoid-mediated cis-eQTL effects are harder to detect than neutrophil-mediated cis-eQTL effects. eQTL Cell-Type Interaction in Whole Blood mediated cis-eQTLs, as compared to the number of neutrophil mediated cis-eQTLs, which is likely caused by their difference in abundance in whole blood). For instance we have recently investigated whole peripheral blood RNA-seq data in over 2,000 samples and identified only a handful monocyte specific eQTLs (manuscript in preparation). As such this indicates that in order to identify monocyte specific eQTLs using whole blood, thousands of samples should be studied.
We confined our analyses to a subset of cis-eQTLs for which we had previously identified a main effect in whole peripheral blood [22]: for each cis-eQTL gene, we only studied the most significantly associated SNP. Considering that for many cis-eQTLs multiple, unlinked SNPs exist that independently affect the gene expression levels, it is possible that we have missed myeloid or lymphoid mediation of these secondary cis-EQTLs.
The method we have applied to predict the neutrophil percentage in the seven whole blood datasets involves correlation of gene expression probes to cell count abundances and subsequent combination of gene expression probes into a single predictor using PCA. This approach is comparable to other deconvolution methods for methylation and gene expression data [6][7][8][9][10][11]. However, methods have also been published [28,29] that take a more agnostic approach towards identifying different cell-types and their abundances across different individuals. The Preininger et al method [28] identified different axes of gene expression variation in peripheral blood, of which some reflect proxies of certain cell-types. We quantified these axes for each of the samples of the EGCUT and Fehrmann cohorts by creating proxy phenotypes, and subsequently conducted per axis an interaction meta-analysis and indeed identified eQTLs that were significantly mediated by these axes (S6 Table). We first ascertained the Z-scores of the eQTL interaction effects for axis 5 of Preininger et al, an axis that is known to correlate strongly with neutrophil percentage. As expected, we observed a very strong correlation with the Z-scores of the eQTL interaction effects for the neutrophil proxy (R = 0.72). While some other axes also mediate eQTLs that might be attributable to differences in other cell-type proportions (e.g. axis 2 that is likely due to differences in reticulocyte proportions), while other axes might even reflect differences in the state in which these cells are (e.g. stimulated immune cells or senescent cells). We believe it is possible that some of these axes might reflect differences in gene activity (rather than differences in cell-type proportions), and that it could be that such differences in gene activity might mediate certain eQTLs. This indicates that by applying novel methods that can summarize gene expression [28,29] and using these summaries as interaction terms when conducting eQTL mapping, various 'context specific' eQTLs might become detectable. Although we have shown that the proxy that is created by our method is able to predict neutrophil percentage accurately, this may not be the case for all cell types available in whole blood, which may be greatly dependent upon the ability of individual gene expression probes to differentiate between cell types However, we anticipate that the (pending) availability of large RNA-seq based eQTL datasets, statistical power to identify cell-type mediated eQTLs using our approach will improve: since RNA-seq enables very accurate gene expression level quantification and is not limited to a set of preselected probes that interrogate well known genes (as is the case for microarrays), the detection of genes that can serve as reliable proxies for individual cell-types will improve. Using RNA-seq data, it is also possible to assess whether SNPs that affect the expression of non-coding transcripts, affect splicing [30] or result in alternative polyadenylation [31] are mediated by specific cell-types.
The method we have used did not account for heteroscedasticity while estimating the standard errors of the interaction term, which may lead to inflated statistics. We therefore compared the standard errors that we used, with standard errors that have been estimated while accounting for heteroscedasticity (using the R-package 'sandwich'). The heteroscedasticity-consistent standard errors and p-values were very similar (Pearson correlation > 0.95; S8 Fig) to the standard errors that did not account for heteroscedasticity. We also note that measurement error in the covariates of the model we have used may cause the inferred betas to be biased. Structural equation modeling may be used to determine unbiased estimates by taking measurement errors of the covariates into account (particularly the neutrophil percentage proxy). However, typically these methods require replicate measurements of the covariates, which were not available for the cohorts in our study. As such, the observed interaction effects may be either underestimated or overestimated, depending on the character and the degree of the measurement error in our covariates.
Although we applied our method to whole blood gene expression data, our method can be applied to any tissue, alleviating the need to sort cells or to perform laser capture micro dissection. The only prerequisite for our method is the availability of a relatively small training dataset with cell count measurements in order to develop a reliable proxy for cell count measurements. Since the number of such training datasets is rapidly increasing and meta-analyses have proven successful [2,22], our approach provides a cost-effective way to identify celltype-mediated or-specific associations that can supplement results obtained from purified cell type specific datasets, and it is likely to reveal major biological insights.

Setup of study
This eQTL meta-analysis is based on gene expression intensities measured in whole blood samples. RNA was isolated with either PAXgene Tubes (Becton Dickinson and Co., Franklin Lakes, NJ, USA) or Tempus Tubes (Life Technologies). To measure gene expression levels, Illumina Whole-Genome Expression Beadchips were used (HT12-v3 and HT12-v4 arrays, Illumina Inc., San Diego, USA). Although different identifiers are used across these different platforms, many probe sequences are identical. Meta-analysis could thus be performed if probe-sequences were equal across platforms. Integration of these probe sequences was performed as described before [22]. Genotypes were harmonized using HapMap2-based imputation using the Central European population [32]. In total, the eQTL genotype x environment interaction meta-analysis was performed on seven independent cohorts, comprising a total of 5,863 unrelated individuals (full descriptions of these cohorts can be found in the Supplementary Note). Mix-ups between gene expression samples and genotype samples were corrected using MixupMapper [33].

Gene expression normalization for interaction analysis
Each cohort performed gene expression normalization individually: gene expression data was quantile normalized to the median distribution then log 2 transformed. The probe and sample means were centered to zero. Gene expression data was then corrected for possible population structure by removing four multi-dimensional scaling components (MDS components obtained from the genotype data using PLINK) using linear regression. Additionally, we corrected for possible confounding factors due to arrays of poor RNA quality. We reasoned that arrays of poor RNA quality generally show expression for genes that are normally lowly expressed within the tissue (e.g. expression for brain genes in whole blood data). As such, the expression profiles for such arrays will deviate overall from arrays with proper RNA quality. To capture such variable arrays, we calculated the first PC from the sample correlation matrix and correlated the first PC with the sample gene expression measurements. Samples with a correlation < 0.9 were removed from further analysis (S9 Fig). In order to improve statistical power to detect cell-type mediated eQTLs, we corrected the gene expression for technical and batch effects (here we applied principal component analysis and removed per cohort the 40 strongest principal components that affect gene expression). Such procedures are commonly used when conducting cis-eQTL mapping [2,5,22,30,31,34]. To minimize the amount of genetic variation removed by this procedure, we performed QTL mapping for each principal component, to ascertain whether genetic variants could be detected that affected the PC. If such an effect was detected, we did not correct the gene expression data for that particular PC [22]. As a result, this procedure also removed the majority of the variation that explained the correlation between neutrophil percentage and gene expression (S10 Fig), minimizing issues with possible collinearity when testing the interaction effects. We chose to remove 40 PCs based on our previous study results, which suggested that this was the optimum for detecting eQTLs [22]. We would like to stress that while PC-corrected gene expression data was then used as the outcome variable in our gene x environment interaction model, we used gene expression data that was not corrected for PCs to initially create the neutrophil cell percentage proxy.
Creating a proxy for neutrophil cell percentage from quantile normalized and log 2 transformed gene expression data To be able to determine whether a cis-eQTL is mediated by neutrophils, we reasoned that such a cis-eQTL would show a larger effect size in individuals with a higher percentage of neutrophils than in individuals with a lower percentage. However, this required the percentage of neutrophils in whole blood to be known, and cell-type percentage measurements were not available for all of the cohorts. We therefore created a proxy phenotype that reflected the actual neutrophil percentage that would also be applicable to datasets without neutrophil percentage measurements. In the EGCUT dataset, we first quantile normalized and log 2 transformed the raw expression data. We then correlated the gene expression levels of individual probes with the neutrophil percentage, and selected 58 gene expression probes showing a high positive correlation (Spearman R > 0.57). Here, we chose to use the quantile normalized, log 2 transformed gene expression data that was not corrected for principal components, since correction for principal components would remove the correlation structure between gene expression and neutrophil percentage (S10 Fig).
In each independent cohort, we corrected for possible confounding factors due to arrays with poor RNA quality, by correlating the quantile normalized and log 2 transformed gene expression measurements against the first PC determined from the sample correlation matrix. Only samples with a high correlation (r ! 0.9) were included in further analyses. Then, for each cohort, we calculated a correlation matrix for the neutrophil proxy probes (the probes selected from the EGCUT cohort). The gene expression data used was quantile normalized, log 2 transformed and corrected for MDS components. Applying PCA to the correlation matrix, we then obtained PCs that described the variation among the probes selected from the EGCUT cohort. As the first PC (PC1) contributes the largest amount of variation, we considered PC1 as a proxy-phenotype for the cell type percentages.

Determining cell-type mediation using an interaction model
Considering the overlap between the cohorts in this study and our previous study, we limited our analysis to the 13,124 cis-eQTLs having a significant effect (false discovery rate, FDR < 0.05) in our previous study [22]. This included 8,228 unique Illumina HT12v3 probes and 10,260 unique SNPs (7,674 SNPs that showed the strongest effect per probe, and 2,586 SNPs previously associated with complex traits and diseases, as reported in the Catalog of Published Genome-Wide Association Studies [1], on 23 rd September, 2013). We defined the model for single marker cis-eQTL mapping as follows: where Y is the gene expression of the gene, β 1 is the slope of the linear model, G is the genotype, I is the intercept with the y-axis, and e is the general error term for any residual variation not explained by the rest of the model. We then extended the typical linear model for single marker cis-eQTL mapping to include a covariate as an independent variable, and captured the interaction between the genotype and the covariate using an interaction term: where P (cell-type proxy) is the covariate, and P:G is the interaction term between the covariate and the genotype. We used gene expression data corrected for 40 PCs as the predicted variable (Y). The interaction terms were then meta-analyzed over all cohorts using a Z-score method, weighted for the sample size [35].

Multiple testing correction
Since the gene-expression data has a correlated structure (i.e. co-expressed genes) and the genotype data also has a correlated structure (i.e. linkage disequilibrium between SNPs), a Bonferroni correction would be overly stringent. We therefore first estimated the effective number of uncorrelated tests by using permuted eQTL results from our previous cis-eQTL meta-analysis [22]. The most significant P-value in these permutations was 8.15 x 10 -5 , when averaged over all permutations. As such, the number of effective tests = 0.5 / 8.15 x 10 -5 % 6134, which is approximately half the number of correlated cis-eQTL tests that we conducted (= 13,124). Next, we controlled the FDR at 0.05 for the interaction analysis: for a given P-value threshold in our interaction analysis, we calculated the number of expected results (given the number of effective tests and a uniform distribution) and determined the observed number of eQTLs that were below the given P-value threshold (FDR = number of expected p-values below threshold / number of observed p-values below threshold). At an FDR of 0.05, our nominal p-value threshold was 0.009 (corresponding to an absolute interaction effect Z-score of 2.61).

Cell-type specific cis-eQTLs and disease
For each trait in the GWAS catalog, we pruned all SNPs with a GWAS association P-value below 5 x 10 -8 , using an r 2 threshold of 0.2. We only considered traits that had more than 20 significant eQTL SNPs after pruning (irrespective of cell-type mediation). Then, we determined the proportion of pruned neutrophil-mediated cis-eQTLs for the trait relative to all the neutrophil-mediated cis-eQTLs. The difference between both proportions was then tested using a binomial test.

Data access
The source code and documentation for this type of analysis are available as part of the eQTL meta-analysis pipeline at https://github.com/molgenis/systemsgenetics Summary results are available from http://www.genenetwork.nl/celltype  We correlated the actual NOD2 gene expression levels with age in the EGCUT dataset (n = 825, normalized using log 2 transformed and quantile normalization, and gene expression levels corrected for 40 principal components) and observed that there is a low, but significant correlation between age and NOD2 gene expression in the log 2 transformed and quantile normalized data (top), which becomes insignificant when correcting the gene expression data for 40 principal components (which was used to determine the neutrophil interaction effect; bottom). However, NOD2 gene expression levels are not significantly associated with gender. The gene expression data that was used for the interaction meta-analysis was corrected for up to 40 principal components. In order to retain genetic variation in the gene expression data, components that showed a significant correlation with genotypes were not removed. In the EGCUT dataset (n = 825), many of these components also strongly correlate with neutrophil percentage (top) and inferred neutrophil percentage (bottom). The majority of the variation in gene expression explained by these components (right) was however removed from this dataset. (TIF) S1  Table. We created proxy phenotypes for the 9 axes of variation described by Preininger et al [28] within the EGCUT (n = 891) and Fehrmann (n = 1,220) cohorts. We then meta-analyzed the interaction terms for these two cohorts and observed that several axes mediate eQTL effects. The Z-scores for the interaction effects for axis 5 correlate strongly with the Z-scores for interaction effects for the neutrophil proxy (R = 0.74).