Genomic Regions Associated with Multiple Sclerosis Are Active in B Cells

More than 50 genomic regions have now been shown to influence the risk of multiple sclerosis (MS). However, the mechanisms of action, and the cell types in which these associated variants act at the molecular level remain largely unknown. This is especially true for associated regions containing no known genes. Given the evidence for a role for B cells in MS, we hypothesized that MS associated genomic regions co-localized with regions which are functionally active in B cells. We used publicly available data on 1) MS associated regions and single nucleotide polymorphisms (SNPs) and 2) chromatin profiling in B cells as well as three additional cell types thought to be unrelated to MS (hepatocytes, fibroblasts and keratinocytes). Genomic intervals and SNPs were tested for overlap using the Genomic Hyperbrowser. We found that MS associated regions are significantly enriched in strong enhancer, active promoter and strong transcribed regions (p = 0.00005) and that this overlap is significantly higher in B cells than control cells. In addition, MS associated SNPs also land in active promoter (p = 0.00005) and enhancer regions more than expected by chance (strong enhancer p = 0.0006; weak enhancer p = 0.00005). These results confirm the important role of the immune system and specifically B cells in MS and suggest that MS risk variants exert a gene regulatory role. Previous studies assessing MS risk variants in T cells may be missing important effects in B cells. Similar analyses in other immunological cell types relevant to MS and functional studies are necessary to fully elucidate how genes contribute to MS pathogenesis.


Introduction
Multiple sclerosis (MS) is a complex immune mediated disorder of the central nervous system which arises from a combination of genetic and environmental factors and their interactions [1]. A recent genome wide association study (GWAS) involving more than nine thousand MS patients found evidence for association of MS with 57 genomic regions [2]. However, there remains limited understanding as to how these variants are involved in MS development.
Although T cells have traditionally been thought to mediate MS pathophysiology, attention to the role of B cells is increasing [3][4]. The success of Rituximab (an anti-CD20 monoclonal antibody) [5] heightened this interest, and a number of other anti-CD20 monoclonal antibodies are undergoing clinical trials [6].
The regulation of genes can be just as important as the proteins they encode. Regulatory elements in the genome are much harder to identify than protein-coding genes because they lack distinguishing sequence signatures. Moreover, many regu-latory elements function only in certain cell types and conditions [7].
Chromatin profiling is a powerful means of genome annotation and detection of regulatory activity. The chromatin landscape of a cell is distinctive for a specific cell type and among other roles determines which regions of the genome are accessible to the binding of transcription factors and whether transcription occurs or is repressed. A recent study mapped a number of chromatin marks across nine cell types to systematically characterize regulatory elements and their cell-type specificities. These included enhancer elements (DNA sequences able to modulate gene expression through the binding of transcription factors to them), promoter regions (DNA regions located near the transcription start site of a gene which facilitate the binding of RNA polymerase and the initiation of transcription), polycomb repressed (DNA regions in which gene expression is actively repressed by the binding of polycomb group proteins), heterochromatin (large portions of DNA which are densely packed and therefore less accessible to transcription factors), insulator sites (DNA elements bound by the zinc finger protein CTCF which functions as an enhancerblocking element), transcribed regions and finally repetitive/copy number variations (CNV, DNA regions characterized by a variable number of copies between individuals). Among the cell types profiled were B cells, hepatocytes, fibroblasts and keratinocytes [8].
A large proportion of SNPs associated with MS do not lie in the coding regions of genes and therefore are likely to influence disease risk through a gene regulatory role. It is plausible that genetic variants associated with a certain disease act through influencing the particular cell type(s) that trigger disease onset. Therefore, one would expect to observe an overlap between genomic regions associated with disease risk and those which are active in the causative cell type(s). The aim of this study was to assess whether genomic regions that have been associated with MS significantly overlap with active regulatory regions in B cells and whether this overlap is higher than that observed in non-immunological cell types. This potentially provides us with relevant information regarding the importance of the immune system and B cells as mediators of disease in MS.

Data acquisition
Genetic variants associated with MS risk were obtained from the recent GWAS performed by the International Multiple Sclerosis Genetics Consortium (IMSGC) and the Wellcome Trust Case Control Consortium 2 (WTCCC2) [2]. MS regions were defined as genomic intervals of 0.25 cM centred on the lead associated SNP. The chromatin profiles of B-lymphoblastoid cells (GM12878), hepatocellular carcinoma cells (HepG2), normal lung fibroblasts (NHLF) and normal epidermal keratinocytes (NHEK) were obtained from the ENCODE project [8]. Briefly, chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) and expression data were used to identify different classes of chromatin states: active promoter (AP), weak promoter (WP), poised promoter (PP), strong enhancer (SE), weak enhancer (WE), polycomb repressed (PR), heterochromatic (H), insulator (I), strongly transcribed (ST), weakly transcribed (WT) and repetitive/CNV (Rep/CNV) [8].

Overlap analysis
All analyses were performed using the Genomic Hyperbrowser (http://hyperbrowser.uio.no/hb/) [9]. The enrichment of MS regions in a certain chromatin state (e.g. AP) was calculated as the ratio of the proportion of AP intervals covered by MS regions, to the proportion of non-AP intervals covered by MS regions. In order to assess whether the overlap between MS regions and a certain chromatin state was higher than expected by chance, a permutation based analysis was performed. We defined a null model for which the location of individual chromatin intervals varied randomly, while preserving the empirical segment and inter-segment length distribution of chromatin intervals. MS regions were fixed. The number of overlapping base pairs between the two tracks was calculated for the real data, as well as for 20,000 Monte Carlo samples from the null model. The p-value was calculated in the usual way, i.e. as the proportion of Monte Carlo samples being equal to or more extreme than the observed overlap. These analyses were performed on both a global (whole genome) and local (chromosome arms) scale and for each cell type. For local analyses p-values were adjusted to a FDR of 10%.
When comparing B cells to non immunological cell types, casecontrol tracks were created for each chromatin state by removing all parts of chromatin intervals that overlapped between B and control cells and marking the remaining intervals as case (B cell specific intervals) and control (other cell type specific intervals). Pvalues were computed by a Monte Carlo procedure, in which the case-control labels of chromatin intervals were randomly permuted. The observed base pair overlap between case intervals and MS regions were compared against the corresponding distribution for 20,000 Monte Carlo samples in the usual way. The fold enrichment difference in overlap between B and control cells was calculated as the ratio between the proportions of case and control intervals that overlapped with MS regions.
Finally we tested whether MS associated SNPs (primary SNPs) and SNPs in perfect linkage disequilibrium (LD) with primary SNPs (r 2 = 1) were located within certain chromatin states more than expected by chance as described above for MS regions.

Active chromatin states in B cells overlap with MS regions
Our first aim was to assess whether and where in the genome a particular chromatin state in B cells significantly overlapped with MS regions. The enrichment of MS regions in different chromatin states and the significance of the overlap are presented in Table 1. On a global scale (whole genome) enrichment values varied considerably between different chromatin states ranging from 0.34 in H to 3.07 in SE regions. When testing statistical significance, MS regions overlapped with promoter (AP and WP p = 0.00005; PP p = 0.0005), enhancer (p = 0.00005) and transcribed (p = 0.00005) regions more than expected by chance.
In order to assess whether the significant global overlap was homogeneously distributed across the genome or resulted from particularly highly enriched regions, the same analysis was performed on a local scale by dividing the whole genome into chromosome arms. This resulted in 43 different 'bins' of which 17 had to be excluded due to the absence of MS associated regions in those chromosome arms leaving a total of 26 bins. Out of these 26 bins, statistically significant overlap was found in 18 for AP, 15 for WP, 7 for PP, 23 for SE, 24 for WE, 3 for PR, 0 for H, 2 for I, 9 for ST, 9 for WT and 1 for Rep/CNV chromatin states ( Table 1). As expected, the chromatin states with significant overlap on a global scale were those with the highest number of significant bins. SE and WE regions showed the most homogeneously distributed overlap, being significant in all but 3 and 2 bins respectively. The overlap of promoter and transcribed regions appeared more dependent on particular bins. Active chromatin states in B cells overlap with MS regions more than in non immunological cell types On its own, the presence of an overlap between MS regions and active chromatin states in B cells is not sufficient to indicate that B cells are relevant to MS pathogenesis. Both MS regions and active chromatin states could just be more likely to be near commonly transcribed genes, giving rise to co-localization in the absence of any direct relationship between the MS-associated regions and chromatin states. To rule out this hypothesis we tested 3 additional cell types (hepatocytes, fibroblasts and keratinocytes) that, based on current knowledge, are not implicated in MS pathogenesis. Enrichment values, global significance and number of significant bins are presented in Table 2.
Similarly to findings observed in B cells, promoter, enhancer and transcribed regions overlapped with MS regions more than expected by chance in all control cell types. However, the number of significant bins as well as the enrichment values tended to be higher in B cells than in other cell types. We explored this further by directly comparing the overlap in B cells with that of the control cells (Table 3). Strikingly the overlap between MS regions and AP, SE, WE and ST regions was significantly higher in B cells than in any of the other cell types. The highest fold enrichment differences were observed for higher activity states (AP, SE and ST).

MS associated SNPs preferentially land in active promoter and enhancer regions
The presence of significant overlap between MS regions and certain active chromatin states in B cells supports an important role for the immune system in MS but does not provide any insight into how MS risk variants may be acting. We attempted to answer this question by looking at where in the genome MS associated SNPs, and SNPs in perfect LD (r 2 = 1), preferentially land. A total of 452 SNPs were tested and enrichment of chromatin states for MS SNPs and significance of overlap are presented in Table 4. MS SNPs were located within AP, SE and WE intervals more than expected by chance. Weak evidence for overlap was also observed for WP, ST and WT regions. When examined in the light of the chromatin data, several regions seemed particularly interesting. For example MS associated SNPs in the regions of CLECL1, CD86, TYK2 and CD58 land in promoter and enhancer regions which are present in B cells but not in other cell types (Figure 1). We also looked at the position of MS SNPs in regions in which no candidate genes have been identified. Interestingly, rs12466022 on chromosome 2 and respective SNPs in LD landed in WE and SE intervals, while rs13192841 on chromosome 6 and respective SNPs in LD were located in WE and WT regions. The complete list of SNPs and overlapping chromatin states is available in supplementary material online (Table S1).

Discussion
MS is a complex disorder of unknown aetiology. Here we show that genomic regions associated with MS overlap with AP, SE, WE and ST regions in B cells and that this occurs more than would be expected by chance, and more than was observed in 3 other cell types unrelated to MS pathogenesis. Notably, the overlap was particularly striking in SE and WE regions for which significance was reached in 23 and 24 out of the 26 analyzed bins respectively. This is in accordance with the previous observations that tissue-specific genes appear more dependent on enhancer than promoter elements [8]. Furthermore, we provide evidence that the associated SNPs preferentially land in promoter, enhancer and to a lesser extent transcribed regions. These findings have several important implications.
Firstly, this work further supports the immunological aetiology of MS [10]. Our findings are in agreement with those of a geneontology analysis of the genes located within MS associated regions, which showed a substantial overrepresentation of immune-related processes [2]. As compared to this type of analysis, our approach has the relative advantage of being independent of what is currently known on genes and cell types.
Secondly our observations provide further support for an important role for B cells in the pathogenesis of MS. The presence of oligoclonal bands in the cerebrospinal fluid is the most  consistent immunological finding in MS, and this indicates abnormal B cell activation within the CNS of MS patients [11]. Furthermore B cell abnormalities influence both conversion to clinically definite MS, MRI activity, onset of relapses and disease progression [12][13][14][15][16][17]. Possibly the strongest evidence for B cells in MS comes from clinical trials showing that MRI activity and onset of relapses are significantly decreased after depletion of CD20+ B cells [5]. However, we must consider that certain chromatin features may be shared between B and other immune cell types, in particular T cells. Unfortunately a similarly detailed chromatin profile of T cells is not yet available and therefore a direct comparison between B and T cells could not be performed. Even if similar chromatin profiles exist between T and B cells, the attempts to understand the effects of genetic risk variants on T cell function [18] may be missing important effects in B cells. Given the increasing evidence for B-T cell interactions in MS [19][20][21], this analysis has the potential to greatly help the dissection of the roles played by these two cell types. When MS SNPs rather than MS regions were analyzed, we found that MS SNPs were significantly more likely to land in AP, SE and WE regions than expected by chance perhaps suggesting that many of the associated SNPs may influence the risk of MS by modifying the binding of transcription factors and transcription in general. This is in agreement with previous observations [8]. For MS associated SNPs landing in non-genic regions, for the first time we are able to show a likely functional role in gene regulation. However these findings should be interpreted with caution for two reasons. First, the observed overlap between MS SNPs and active chromatin states may be consequent to the fact that MS SNPs land in MS regions, themselves enriched for active chromatin states. Secondly, a conclusive answer to this question can only come from functional studies which should investigate if and how MS variants affect the chromatin landscape and gene expression.
To conclude, genomic regions associated to MS susceptibility are active in B cells and causative SNPs may act by changing the chromatin landscape. Further similar analyses in other immunological cell types relevant to MS and functional studies are required to fully understand in which cells, at which stage and how MS genetic variants are acting.

Supporting Information
Table S1 List of MS associated SNPs and SNPs in perfect LD (r 2 = 1) with overlapping chromatin states. (XLSX)