An autoimmune disease risk variant: A trans master regulatory effect mediated by IRF1 under immune stimulation?

Functional mechanisms remain unknown for most genetic loci associated to complex human traits and diseases. In this study, we first mapped trans-eQTLs in a data set of primary monocytes stimulated with LPS, and discovered that a risk variant for autoimmune disease, rs17622517 in an intron of C5ORF56, affects the expression of the transcription factor IRF1 20 kb away. The cis-regulatory effect specific to IRF1 is active under early immune stimulus, with a large number of trans-eQTL effects across the genome under late LPS response. Using CRISPRi silencing, we showed that perturbation of the SNP locus downregulates IRF1 and causes widespread transcriptional effects. Genome editing by CRISPR had suggestive recapitulation of the LPS-specific trans-eQTL signal and lent support for the rs17622517 site being functional. Our results suggest that this common genetic variant affects inter-individual response to immune stimuli via regulation of IRF1. For this autoimmune GWAS locus, our work provides evidence of the functional variant, demonstrates a condition-specific enhancer effect, identifies IRF1 as the likely causal gene in cis, and indicates that overactivation of the downstream immune-related pathway may be the cellular mechanism increasing disease risk. This work not only provides rare experimental validation of a master-regulatory trans-eQTL, but also demonstrates the power of eQTL mapping to build mechanistic hypotheses amenable for experimental follow-up using the CRISPR toolkit.

The authors map trans eQTLs in monocytes from 134 donors stimulated with LPS for 6h, restricting the analysis to SNPs previously identified as response cis-eQTLs at an earlier timepoint (90 mins LPS treatment). They identify a locus that is a cis-eQTL for IRF1 and is associated with a large number of IRF1 target genes in trans, as well as being a GWAS locus. They aim to validate this putative enhancer using CRISPRi and CRISPR in a modified HEK293 cell line, and present a model of how the risk allele impacts gene expression.
It is important to conduct these functional analyses to follow up eQTL studies and I appreciate the challenges with being able to design the perfect experiment to directly address the difficult problem of identifying causal SNPs and genes. However, my major concern is that the results do not support the strong conclusions made in the manuscript. The results presented are only suggestive of a role of IRF1 and rs17622517 in the nearby enhancer locus and additional work will be required to confirm they are the causal gene and SNP.

RE:
We appreciate the reviewer's concerns. As detailed below, we have addressed them in revisions by performing additional experiments and analysis that support the mediating role of IRF1 and this SNP as a functional variant. Since the term "causal variant" can be problematic despite its widespread use, we now predominantly refer to this SNP locus being functional, which we have direct evidence of.
Major comments: 1. The results of the CRISPRi experiment do not support the strong conclusion that IRF1 is downregulated by the enhancer silencing a. As stated on page 10 "The effect of the enhancer silencing is not significantly different from the controls for IRF1". Given that downregulation of IRF1 was not observed I would suggest that the later sentence "The differential expression of promoter and enhancer have a strong positive correlation...which indicates that enhancer silencing downregulates IRF1" is too strong a conclusion and should be rephrased.
b. On page 10: "the response being similar to the IRF1 promoter silencing further provides strong evidence that it is indeed an enhancer of IRF1". This evidence is suggestive rather than strong. RE: Given the additional experimental support for this locus being an enhancer for IRF1, and a fair point about the needing to soften the statement, we adjusted accordingly this sentence in the abstract: "Using CRISPRi silencing, we showed that perturbation of the SNP locus downregulates IRF1 and causes widespread transcriptional effects. Genome editing by CRISPR had suggestive recapitulation of the LPS-specific trans-eQTL signal and lent support for the rs17622517 locus being causal. " 2. On page 10, could the authors expand upon why they hypothesise that they are "underpowered to detect an effect on IRF1 in cis, but might capture its trans effects" given there is usually better power for the detection of cis effects? RE: While we agree that cis effects tend to be larger and easier to detect and that the individual trans effects are weak and noisy, our rationale in this point is that looking at a lot of genes compensates for the noise in any individual gene. To clarify in the text, we rephrased the statement to:

"...we hypothesized that we may be underpowered to detect an effect on IRF1 in cis, but might capture its trans effects by counteracting the noise present in individuals genes by looking at many genes at once."
3. The results of the CRISPR experiment do not support the conclusion that rs17622517 is the causal SNP driving the cis and trans eQTL effects. The experiment conducted provides additional supportive evidence for the role of that locus in the regulation of IRF1 expression, but it still doesn't address the hypothesis that rs17622517 is the causal SNP.
a. Could the authors elaborate on what the correct interpretation of NHEJ is in the context of an eQTL? How should the reader interpret the deletion of the eQTL allele rather than a substitution? RE: We had originally aimed to edit the exact variant with homologous recombination, but homologous recombination efficiency was so low for this locus that it was unfeasible to obtain a sufficient number of clones. Therefore, we pursued NHEJ as a way to measure the effect of perturbation of the genetic locus with replicates of each genotype. We agree that this does not necessarily recapitulate the effect of the variant, but our results clearly indicate that this specific locus is functional rather than a nonfunctional LD proxy of another causal variant. We have expanded on this in the discussion:

"While editing the exact variant into a cell line may reflect directionally the same genetic effect as seen in the population, a deletion of a locus could feasibly have an effect in the same direction, opposite direction or not at all. This depends upon the specific mechanism of the genetic effect. For example, if a variant increases the affinity of a DNA binding protein by modifying its binding motif, that effect may not be replicated by a simple deletion of the variant locus. However, a functional impact of indels does provide evidence that the specific SNP locus has a regulatory effect rather than being a nonfunctional LD proxy of another causal variant. "
b. On page 12 it is very hard to see the "potential slight genotype effect" particularly in fig  3a. RE: We agree that this effect is too subtle to mention in the text, so we deleted that part of the sentence.
c. Although the correlations between CRISPR and CRISPRi are statistically significant, they are very weak and what looks like nearly half of the genes show opposite effects between the two experiments.

RE:
We selected the gene set for this analysis using an FDR cutoff of 0.25, which means roughly 25% of the genes are expected to be false positives and are expected to contribute to the noise in the correlation. We reran this analysis using a 5% FDR, and the enhancer shows a similar correlation, although it is not significant due to the low number of data points. We have included the plots at FDR 0.05 in Supplementary Figure 12, and below. We have also softened the language concerning the results and interpretation of the CRISPR experiment.
d. As described in the points above, I do not agree that the results presented support the sentence on page 13 "Having provided experimental evidence that rs17622517 is the causal eQTL variant" RE: We agree that this phrasing is too strong and have rephrased the beginning of the sentence to "Having suggestive experimental evidence...". e. Fig 4a shows a weak correlation between the trans-eQTL effect size and the CRISPR experiment which is being driven by a small number of genes. Given that such a relaxed FDR threshold of 0.5 has been used, could the authors comment on whether the genes that do show a good correlation between the experiments are those with the more significant trans eQTL FDR? RE: The correlation is indeed weak, but one must consider that we are comparing a genetic association in primary cells to effects of mixed indels in a cell line. These are not replication experiments but highly orthogonal validation experiments with many biological sources of variance. We also plotted the same correlation with a 0.05 FDR threshold and find that they have a strong correlation, albeit not significant due to low number of genes (new Supplementary figure 13d, right).
f. As described in the points above, the results presented do not support the statement in the abstract "CRISPR further indicated that rs17622517 is indeed a causal variant in this locus, and recapitulated the LPS-specific trans-eQTL signal"

RE: We have softened the phrasing of this sentence in the abstract to better reflect the results of the study: "Genome editing by CRISPR had suggestive recapitulation of the LPS-specific trans-eQTL signal and lent support for the rs17622517 locus being causal."
Minor comments: 1. Could the authors clarify if the "strongest trans-eGenes" referred to on page 7 are those with the most significant eGenes rather than necessarily those with the greatest effect size? RE: We could not find a reference to strongest trans-eGenes on page 7, and perhaps the reviewer was referring to the line on page 3, "The alternative allele of the lead variant rs17622517 is associated with upregulation of IRF1 upon early immune stimulus, and upregulation of 11 out of 12 of its strongest trans-eGenes as well". We meant that these are the 12 eGenes with the lowest FDR and therefore to clarify, we rephrased this part of the sentence to "11 out of 12 of its most significant trans-eGenes".
2. Page 7: how were the tagging variants identified? Could this SNP be tagging a structural variant?
RE: The reviewer raises a good point. We have included more details about the LD tagging analysis in the text. Structural variants were not included in the genotyping or imputation of the genotyped data. To explore this possibility, we accessed all structural variants within 50,000 bp of the top SNP rs17622517 in the gnomAD v3.1 SV dataset (https://gnomad.broadinstitute.org/). Below please find a table with all the structural variants in this region. The variants are all very rare and seem unlikely to be tagged by rs17622517, which has a MAF of 0.057 in gnomAD. The most common DEL_5_62909 has an AF of 0.005, however is almost exclusively found in African Americans in the dataset and therefore is unlikely to be driving the eQTL association in this European population. We now mention this in the text. 3. Beyond highlighting that the rs17622517 has a substantial number of trans-eGenes, Figure 1a is initially hard to interpret further. It may be more helpful to plot a histogram of how many cis SNPs have 1 trans-eGene, 2 trans-eGenes etc to highlight the rs17622517 locus but also give a better idea of how many eGenes are detected for the other SNPs.

RE: We agree that the information in fig 1a
could be better represented. Therefore, we have taken the reviewer's suggestion to replace it with a histogram of the number of eGenes per SNP (right).
4. It would be helpful to include the p value and effect size of the cis eQTLs in figure 1b.

RE: Thank you, these have been added.
5. What does the shading represent on figure 1e? RE: Thank you for pointing out this missing information. The shading represents the intensity of the transcription factor peak from ENCODE. We have updated the legend accordingly:

. Enhancer and promoter silencing effects for different LPS time points. Number of significantly differentially expressed genes in promoter (a) and enhancer (b) versus controls at the respective LPS condition."
7. Are the authors concerned that the differential expression lists aren't enriched for IRF1 targets? Is it something else driving the difference in expression e.g. off target CRISPR effects, or is this related to the time points chosen? RE: While the trans-eGenes are enriched for IRF1 targets, we do not see this enrichment in the CRISPRi promoter or enhancer differential expression. Naturally, this is a signal that we would have preferred to see, but we are not overly concerned. This is because we do see some enrichment for immune-related pathways, such as NFKB signaling (regulated by IRF1), and the hundreds of differentially expressed genes are likely to include a large number of affected genes further downstream in the pathway that are not directly regulated by IRF1 binding. It is conceivable that the pathway effect of the trans-eQTL SNP in primary monocytes could be more subtle and allow us to see mostly direct IRF1 targets, compared to the stronger CRISPRi perturbation in a cell line system that may allow detection of a more widespread pathway disruption. Fig 6), which are also enriched for IRF1 targets. The experimental work presented here suggest a trans-effect of the rs17622517 locus regulating immune response genes following LPS stimulation, but it is not sufficient to conclude that the effect is modulated by IRF1. Indeed, the authors did not show any direct evidence that the rs17622517 locus is an enhancer of IRF1 expression since the inhibition of this locus by CRISPRi has no effect on IRF1 expression (Wilcoxon p=0.8), as opposed to the inhibition of IRF1 promoter which exhibited a significant down-regulation of IRF1 (Wilcoxon p=7.5x10-7). The same remark can be made for the CRISPR/Cas9 genome editing experiments where the different deletions introduced in the rs17622517 locus have again no effect on IRF1 expression. In addition, the lack of power that they highlighted several times in the article, as well as the relaxed FDR thresholds that they use for the analysis indicate some weakness of the study and highlight the need to be replicated more deeply in order to have confident results. For all these reasons, the authors should be more cautious when claiming that the trans-effect of the rs17622517 locus is modulated by IRF1.

Regarding the time points, we did test different time points in a pilot study and selected 6 hours because it had the highest expression of the most IRF1 trans-eGenes (Supplementary
The conclusions, as well as the title of this article, should be modified accordingly.

RE:
We have performed additional experiments to address this concern, also raised by Reviewer 1. RE: We have performed additional experiments to address this concern and also revised the text. We performed CRISPRi silencing of the promoter and enhancer in the THP1 monocyte-like cell line, and showed that the enhancer silencing indeed affects IRF1 expression with upregulation without LPS stimulus and suggestive downregulation at 90 mins. These results are included in Fig 2b and shown below. This result provides bona fide evidence that the locus is an enhancer of IRF1.

Furthermore, we now show that this variant is not a cis-eQTL for any other gene in our monocyte data (Supplementary Figure 2, copied below). We highlight that our claim of causality of IRF1 is not reliant on the CRISPR experiments alone; the cis/trans-eQTL discovery is also an important line of evidence.
Altogether, we feel that our combined eQTL and experimental results strongly support the mediating role of IRF1. There are multiple lines of evidence: 1) the two independent cis-eQTL effects on IRF1 (and no other genes in cis) with concordant trans-eQTL signals, 2) downregulation of IRF1 after silencing the enhancer locus in THP1 cells, 3) similar genome-wide expression effect of enhancer and promoter silencing as well as genomic perturbation of the locus.
We have, however, softened the claims across the manuscript to be transparent about the degree of evidence provided by the data presented in this work.
Moreover, the cellular model of HEK293-TLR4 cell line used here does not seem to be well suited to study IRF1 pathway. Although this cellular system is responsive to LPS stimulus, there is no enrichment of IRF1 target genes among differentially expressed genes upon stimulation, as opposed to the results obtained in the primary monocytes. A comparison of the transcriptional response upon stimulation between monocytes and HEK293-TLR4 cell line would be informative (Venn diagram to see the proportion of genes expressed in both cell-types, correlation of the transcriptional response for genes commonly expressed, GO enrichment for these genes…). More particularly, are the 12 trans-eGenes the most strongly associated at FDR=0.05 in monocytes expressed in HEK293 -TLR4 cells? If so, is there a correlation in their expression? RE: The reviewer raises a good point, and we have added data to the manuscript to further demonstrate that HEK-hTLR4 indeed recapitulates immune response in monocytes. While the entire IRF1 pathway is unlikely to be identical across cell types, we would like to note that the very strong gene expression effect of the CRISPi silencing of IRF1 promoter in HEK-hTLR4 shows that this gene has a functional impact in this cell line. Furthermore, the new THP1 experiments show that the enhancer locus affects IRF1 expression. Supplementary Figure 5d-f, copied below. First, we compared the total expressed genes in monocytes and HEK-hTLR4 cells and found that the majority are expressed in both (Supplementary Figure 5d). Next, to analyze the similarity of immune response between HEK-hTLR4 cells and primary monocytes, we took the log2 fold change (log2FC) for the top 100 differentially expressed mutually expressed genes (by absolute value of fold change) in the HEK-hTLR4 cells and correlated with the log2FC in monocytes and found that they are well correlated (Supplementary Figure  5e). We also took the top 100 DE monocyte genes and performed the same correlation ( Supplementary Figure 5f), which also shows a correlation, albeit more modest.

The analyses suggested by the reviewer are outlined below and now included in
We were not sure what the reviewer meant by analysis of correlation in the expression of the 12 trans-eGenes. Seven of the 12 trans-egenes are expressed in the HEK-TLR4 cell line, defined as an average read count of greater than 5 across conditions.
The CRISPR/Cas9 gene editing design used by the authors did not allow the discrimination of the trans-effect at the nucleotide resolution since they introduced deletions surrounding the rs17622517 locus and did not perform SNP editing. They delimited a DNA sequence where a regulatory element may be present and any nucleotide modification within this sequence, including rs17622517, may have a trans-effect. A comment on this issue in the article would be appreciable.

RE:
We agree with the reviewer. In response to this point and a comment from reviewer 1, we have expanded on this in the discussion, adding the following text:

"While editing the exact variant into a cell line may reflect directionally the same genetic effect as seen in the population, a deletion of a locus could feasibly have an effect in the same direction, opposite direction or not at all. This depends upon the specific mechanism of the genetic effect. For example, if a variant increases the affinity of a DNA binding protein by modifying its binding motif, that effect may not be replicated by a simple deletion of the variant locus. However, a functional impact of indels does provide evidence that the specific SNP locus has a regulatory effect rather than being a nonfunctional LD proxy of another causal variant."
In addition, the results regarding the analysis of Het and Hom clones are not convincing because there is no down-regulation of IRF1 expression depending on the genotype classes, as expected. The trans-effect with differential expression in Het and Hom clones showed also an opposing direction of the correlations, which is counter-intuitive, and the variability within genotype classes seems very high. The authors used ranges of NHEJ rate to determine the different genotype classes (NHEJ<10% for WT, NHEJ=20-70% for Het and NHEJ>90% for Hom). Did they choose these ranges arbitrary or this decision was supported by biological/statistical arguments? Moreover, how did the authors estimate the number of clones to test per genotype class? What was the transcriptional variation among clones within each class? Given the moderate effect size of rs17622517 locus and knowing the transcriptional variability among clones within each class, they should calculate the statistical power to detect a biological effect using 13 clones.

RE:
We chose the thresholds of 10%, 20-70% and 90% because there was noise in the estimation of NHEJ due to sequencing errors creating misalignment in the genotyping analysis. After using these thresholds to narrow down clones, we further inspected the BAM files manually in IGV to ensure that the ratios made sense in light of there being three alleles in each clone.
We don't have a great estimate of the size of the trans effects we would expect to see in RNA-seq, because the monocyte cells were characterized with microarray which is known to have a smaller dynamic range than RNA-seq and the values are not directly interpretable as counts. However, to answer the question on power, we did a power analysis using the R package "RnaSeqSampleSize" where for each of the 0.5 FDR trans-eGenes, we calculated the effect size that would be needed to detect with the number of clones we obtained and a power of at least 0.8. The results are in the histogram below, demonstrating that although some genes would require a FC that is unlikely for this trans-effect, a large portion (59%) of genes would need a fold change between 1 and 1.5.
My general comment on this article is that the interpretation of the data should be more rigorous: Depending on the biological hypothesis that the authors want to highlight, they considered different FDR thresholds, from 5% FDR up to 50% FDR. They should explain on which criteria they selected these thresholds and what are the consequences in terms of data significance. It reveals the clear lack of statistical power in their analyses, which results in a doubt about the robustness of the genetic associations. RE: In our experience, is not uncommon in genomics to vary FDR thresholds to capture genome-wide signals at different levels of sensitivity and specificity. However, we use 5% FDR consistently with only two exceptions: 1) for trans-eGene discovery we cast a wider net by using 50% FDR, also because we make no specific statements about any individual genes and treat them as a group, 2) in the analysis of correlation of differential expression between CRISPR genotypes and CRISPRi, we use 25% FDR. We have now added this result at 5% FDR (Supplementary Figure  12e&f).
In the figure Fig. 1a, the authors reported that the rs17622517 locus presented "substantially" a larger number of trans-eGenes with respect to other loci at a FDR=0.5. First they should compare groups using a FDR of 0.05 and second test if the difference is statistically significant. In general, the authors should avoid the terms "substantially" or "suggestive" when no statistical support is provided.

RE:
In the text, we had also reported the number of trans-eGenes at 5% FDR, and this locus stands out (12 eGenes compared to 4 for the next highest at 5% FDR, 232 compared to 62 next highest at 50% FDR). We now show this as a new supplementary figure 1 (figure 1a was similarly revised based on feedback from reviewer 1). We have revised the sentence to: "One locus stood out by being associated with a larger number of trans-eGenes than any other: 12 at 5% FDR and 232 at 50% FDR".