To explore the potential influence of the polymorphic 8p23.1 inversion on known autoimmune susceptibility risk at or near BLK locus, we validated a new bioinformatics method that utilizes SNP data to enable accurate, high-throughput genotyping of the 8p23.1 inversion in a Caucasian population. Methods: Principal components analysis (PCA) was performed using markers inside the inversion territory followed by k-means cluster analyses on 7416 European derived and 267 HapMaP CEU and TSI samples. A logistic regression conditional analysis was performed. Results: Three subgroups have been identified; inversion homozygous, heterozygous and non-inversion homozygous. The status of inversion was further validated using HapMap samples that had previously undergone Fluorescence in situ hybridization (FISH) assays with a concordance rate of above 98%. Conditional analyses based on the status of inversion were performed. We found that overall association signals in the BLK region remain significant after controlling for inversion status. The proportion of lupus cases and controls (cases/controls) in each subgroup was determined to be 0.97 for the inverted homozygous group (1067 cases and 1095 controls), 1.12 for the inverted heterozygous group (1935 cases 1717 controls) and 1.36 for non-inverted subgroups (924 cases and 678 controls). After calculating the linkage disequilibrium between inversion status and lupus risk haplotype we found that the lupus risk haplotype tends to reside on non-inversion background. As a result, a new association effect between non-inversion status and lupus phenotype has been identified ((p = 8.18×10−7, OR = 1.18, 95%CI = 1.10–1.26). Conclusion: Our results demonstrate that both known lupus risk haplotype and inversion status act additively in the pathogenesis of lupus. Since inversion regulates expression of many genes in its territory, altered expression of other genes might also be involved in the development of lupus.
Citation: Namjou B, Ni Y, Harley ITW, Chepelev I, Cobb B, Kottyan LC, et al. (2014) The Effect of Inversion at 8p23 on BLK Association with Lupus in Caucasian Population. PLoS ONE 9(12): e115614. https://doi.org/10.1371/journal.pone.0115614
Editor: Courtney G. Montgomery, Oklahoma Medical Research Foundation, United States of America
Received: August 28, 2014; Accepted: October 6, 2014; Published: December 29, 2014
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: Data are from the LLAS2 study and are available upon request to the following: Bahram Namjou, MD, Assistant Professor, Center for Autoimmune Genomics and Etiology Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, CCHMC Location S-9 098, Office: 513-803-5076; John B. Harley, MD, PhD, David Glass Endowed Chair, Professor of Pediatrics and Medicine, University of Cincinnati, Director, The Center for Autoimmune Genomics and Etiology (CAGE), Department of Pediatrics, Cincinnati Children's Hospital Medical Center, 3333 Burnett Avenue, MLC 15012, Cincinnati, Ohio, 45229, Phone: 513-803-3665, Fax: 513-803-5246.
Funding: This study was supported by the National Human Genomic Research Institute: NIH (U01 HG006828, R37 AI024717, P01 AI083194)-BN-JBH-KK-BC-LC-IC and the US Department of Veterans Affairs (IMMA9) Department of Defense (PR094002)-JBH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The inversion polymorphism at 8p23.1 is one of the largest variants found in man, encompassing 4.5 Mb ,  with estimated frequencies that depend on population, 20%–50% in Europeans, 59% in the Yoruba and 12%–27% in Asians –. The fluorescence in situ hybridization (FISH)-based assay is often considered a gold standard to detect chromosomal inversion; however, it is not an effective method when a large number of samples are required to characterize inversions on a population-level. Recently a new statistical method based on principal components analysis (PCA) using unphased high-snp density genotyping data was successfully introduced . The rational for this approach was built upon the concept of suppression of recombination events between two segments of different orientations in the inverted region –. Because of lack of recombination events, the two segments represent two distinct lineages that have been diverging for many generations and accumulating mutations independently. SNPs within the inverted region should therefore have different statistical properties, as if they were from different populations. This population substructure can be readily detected using PCA resulting in a special pattern in the distribution of samples in the space spanned by the first few eigenvectors. This special pattern, consisting of three equi-distant stripes is indicative of inversions and can be used to infer the inversion status of the samples . Previous simulation studies suggest that as few as 150 markers in inverted regions is sufficient to generate a meaningful PCA analyses, however that depends on the type of markers used for the study and the level of linkage disequilibrium (LD) with inversion . Although some candidate tag markers have been reported to determine inversion, the results were mostly inconsistent, mainly because there are no known SNPs in complete LD with inversion that can act as a perfect proxy for the inversion . We are especially interested in the 8p23 inversion because the FAM167A/BLK locus is located inside this inverted segment. BLK (B Lymphoid Tyrosine Kinase) encodes a nonreceptor tyrosine-kinase of the src family and is involved in B-lymphocyte development, differentiation and signaling. Many studies have confirmed genetic variants of FAM167A/BLK to be associated with systemic lupus erythematosus (SLE) as well as multiple other autoimmune diseases, including systemic sclerosis, rheumatoid arthritis and Sjögren's syndrome –.
Here, we implement the PCA approach in our large lupus registry cohort (LFRR) at the 8p23 region in order to determine the influence of inversion on previously known association signals for lupus.
Methods and Subjects
The detail of recruitment and biological sample collection of lupus cases and controls has been described in detail previously , . Samples were supplied from multiple investigators from different institutions with approval from their respective institutional review boards (IRBs). All study participants provided written consent prior to study enrolment; consent forms were obtained at each location under IRB guidelines. Samples were then assembled at the Oklahoma Medical Research Foundation (OMRF) and the study protocols (including the enrollment process, consent forms, collection of DNA and subject information) for this study were approved by the Oklahoma Medical Research Foundation (OMRF) Institutional Review Board.
Recruitment and Biological Sample Collection
Briefly, samples were obtained from the “Large Lupus Association Study #2” (LLAS2) in which 16,500 individuals were genotyped previously, including 8068 Caucasians as previously described , . LLAS2 is a project investigating genetic associations in SLE using a candidate gene approach . All SLE cases met the 1997 ACR classification criteria for SLE. Individual ethnicities were self-reported and genetic outliers were removed by principal component analysis. All genotype data were generated using the Illumina iSelect at the Oklahoma Medical Research Foundation (OMRF) genotyping facility as previously described , .
Principle component analysis (PCA) was performed using EIGENSTRAT and eigenvectors were generated using post-quality control data from 7416 Caucasian samples and 459 polymorphic typed markers that passed standard quality control measures ((MAF>0.01, genotyping rate>0.95% and HWE<0.0001) . These markers reside in the inversion region of 8p23.1 (from 7.2 to 12.4 MB of chromosome 8). In addition 267 HapMap3 Caucasian samples including 165 CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) and 102 TSI (Toscani in Italia)) were also merged into the study as a validation step. The eigenvalues from the first axis of principal component (PC1) were used for K-means clustering assuming three clusters (K = 3). Clustering was performed using the Matlab k-means algorithm, with squared Euclidean distance as the distance measurement (http://www.mathworks.com/help/stats/kmeans.html). SNP allelic association was evaluated between cases and controls using the ×2 test with 1 d.f. The allelic odds ratio (OR) and 95% confidence intervals (95% CIs) were calculated using PLINK . Logistic regression and genotypic conditional analyses were also performed using PLINK, controlling for the effect of a specific SNP or inversion status. Haploview version 4.2 was used to estimate the linkage disequilibrium (LD) between markers and estimate haplotype frequency . Golden-Helix version 8.5.1 was used to graphically display the association results (Golden Helix, Inc., Bozeman, MT, www.goldenhelix.com).
7416 homogenous post-quality control European samples from LFRR were used for this study. The demographic distribution of these samples is shown in table 1. PCA analysis was performed using markers inside the predicted inverted territory (from 7.2 to 12.4 MB of chromosome 8). Since some of the HapMap samples have been previously typed by a FISH assay in other studies, we first merged publicly available genotyping data of these HapMap samples into our cohorts (HapMap3_r3_b36_fwd.consensus, http://hapmap.ncbi.nlm.nih.gov/). In this region, 459 markers were available that passed standard quality control criteria (MAF>0.01, genotyping rate>0.95% and HWE<0.0001); 287 of these markers overlapped with hapmap-3 genotyping data. PCA analyses were performed on 7683 samples (7416 Europeans and 267 HapMap-CEU/TSI).
As predicted, the PCA results identified 3 confined clusters consistent with previous analyses (Fig. 1), . In order to better assign individuals to each cluster as recommended in a previous publication , we performed cluster analyses implementing the K-means clustering approach considering 3 clusters, and then assigned individuals into 3 subgroups as shown in Fig. 2. Although this method effectively clusters individuals into 3 groups, using PCA per se was not sufficient to determine which cluster is homozygous-inverted and which one is homozygous non-inverted. In order to address this, we merged the HapMap data set as described and then evaluated the concordance rate and the status of inversion according to FISH results available in the public domain. Indeed, our PCA methods were able to correctly call 59 out of 60 available HapMap samples with previous typed FISH results and assign them into 3 subgroups with a concordance rate of>98% , (S1 Table). These comparisons also determined which group on either side of the middle heterozygous cluster was inverted homozygous and which cluster belong to non-inverted homozygous. As shown in Fig. 1, we found that group 1 must be inverted homozygous and group 3 non-inverted homozygous. The overall frequency of each subgroup in our Caucasian population was 29% inverted homozygous, 49% heterozygous and 22% non-inverted homozygous. The inversion status was in HWE in our collection (p = 0.45). The ratio of lupus cases to controls in each subgroup was determined with the highest ratio belonging to non-inverted homozygous (1.36) (for inverted homozygous it was 0.97 (1067 cases and 1095 controls), for heterozygous group 1.12 (1935 cases 1717 controls) and non-inverted subgroups it was 1.36 (924 cases and 678 controls)) (Table 2).
Group 1 = Inverted-homozygous, Group 2 = Heterozygous, Group 3 = Non-inverted homozygous.
PC value is a numerical value obtained from STRUCTURE PC1 vector for each individual. In this approach the algorithm attempts to find the centers of the clusters in order to minimize the sum of the distances within each cluster.
In order to further confirm the results of the PCA analyses and to make sure that the estimate for subgrouping is correct, we compared the minor allele frequency of one of the published and common SNPs associated with lupus, rs13277113, to these three subgroups. Indeed, we observed significant minor allele frequency differences among the 3 groups that range from 1% in one group to a major allele frequency of 56% in another group (Fig. 3). These significant variations in allele frequency further confirm the PCA methods in defining subgroups. Interestingly, despite significant variation in allele frequency in different groups, the effects of the association of SNP rs13277113 with lupus were consistent across the three sub groups (Table 2).
We then performed association analyses conditioning on inversion status of individuals. Table 3 shows the summary of statistical results of top associated variants with previous results of p<10-8. We found that the association effects, although weaker, remain significant for most of the top lupus-related variants including rs13277113 and rs998683 (Table 3). The overall effect of all association results before and after controlling for inversion has been shown in table 2.
Considering the inversion status as a genotype call (II, IN, NN), we then calculated the degree of LD between inversion status and top associated markers with lupus. Consistent with previous reports, no markers were detected with perfect proxy with inversion (r2 = 1). As shown in Table 3, markers with higher LD with inversion status such as rs2409718 (r2 = 0.75) or rs2249040 (r2 = 0.58) produced less significant association results with lupus after conditioning for inversion status (Table 3). Furthermore, we detected an association effect between inversion status and SLE in our cohorts (Table 3) in which non-inversion condition was more frequent in Caucasian lupus cases than controls (p = 8.18×10−7, OR = 1.18, 95%CI = 1.10–1.26). Further sub-phenotype analyses suggest some degree of improvement in size effect of association of N allele with lupus nephritis (OR = 1.27, 95%CI = 1.15–139, p = 1.47×10−6) or presence of thrombocytopenia (OR = 1.43, 95%CI = 1.23–1.65, p = 2.00×10−6).
In addition, haplotype analyses using three previously published lupus-associated SNPs (rs2736340, rs13277113, rs2618476) – revealed that the non-inversion status is indeed in LD with the AAG known risk haplotype of lupus risk. This risk haplotype (N-AAG) had a frequency of 29% in cases and 24% in controls (p = 4.57×10−11) (Table 4 and Fig. 4).
Finally, a region with highest LD with inversion (0.75>r2>0.85) was identified at or near long non coding RNA (LINC00208) distal to BLK locus (Fig. 5). Available markers in this region (Chr8:11.43–11.45 Mb) with r2>0.80 include rs10108511, rs2898290, rs2409798, rs10097870. This effect was confirmed in other European-derived data bases (data not shown).
Red dots = original association, Blue dots = Association effects after conditioning on inversion. R2 = correlation coefficient as a measure of linkage disequilibrium between inversion status and all markers in Caucasian population. The highest region in LD with inversion is shown with vertical line at LINC00208.
In this study we validate the novel PCA methodology to identify inversion status in our large Caucasian cohort and evaluate this potential confounding factor in regard to the known BLK association with lupus. Indeed, in almost all HapMap samples (59 out of 60), our approach for determining inversion was in agreement with the experimental results by FISH reported in another study . In addition, several of these HapMap samples were also previously typed in other independent studies and all were concordant with ours (see S1 Table) , , . However, as reported previously, this methodology is not a reliable strategy for identification of inversion status in other ancestries when the suppression of recombination event between inverted and non-inverted chromosomal segments is only moderate and the initial inversion happened too long ago. The 8p23 inversion in the African population and the Asian population are such examples , .
One of the purposes of this study is to determine the potential confounding effect of inversion on a previously reported association with lupus at the BLK locus. Our data indicate that whether we perform association study in three identified subgroups independently (Table 2) or perform conditional analyses controlling for inversion background (Table 3), the association effect at the BLK region with lupus will remain significant (Table 2 & Fig. 4). However, in this study, we also found an association of non-inverted status with lupus as an additional risk factor for SLE in Caucasian (p = 8.18×10−7, OR = 1.18, 95%CI = 1.10–1.26). This novel result was consistent with a previous report in which the lupus risk haplotype was found to reside in non-inverted background in HapMap CEU samples (Table 4), . Obviously part of this association is due to stratification bias of lupus cases because of LD and correlation between non-inverted status and lupus risk haplotype (Table 3&4). In any case, 8p23 inversion, similar to many inversions in humans, regulates gene expression of many genes in its territory as well as exerting indirect effects by maintaining allelic configurations . Previous reports indicate that the number of N alleles (non-inversion status) is additively associated with decreased expression of BLK . In addition, in the same report, 8p23 inversion is also robustly associated with expression of other genes in this territory including XKR6, PPP1R3B, FAM167A, CTSB and sometimes with opposite directions . In a recent lupus functional study of BLK the risk allele (T) at rs922483 was shown to reduce proximal BLK promoter activity and modulated alternative promoter usage . Allele T of SNP rs922483 is a known risk allele in LD with three above mentioned published variants and was correlated with non-inverted status in our analyses (r2 = 0.41) (Table 3). Therefore, both of these risk conditions i.e., non-inverted status (that we found to be more common in lupus cases) and risk allele-T synergistically tend to decrease expression of BLK in lupus patients. However, association of non-inversion status with lupus that alters regulation of multiple genes indicates that other genes in this region might also be important in pathogenesis of lupus perhaps in an orchestrated manner. In addition, the role of long non coding RNAs in the regulation of gene transcription should not be underestimated in which we found LINC00208 (C8orf14) had the highest LD with inversion status in different European derived populations.
In summary, our results add another dimension to the complexity of regulation of BLK in lupus and demand further studies to fully elucidate the interaction of inversion status and candidate functional variants.
Predicted inversion calls for all 267 caucasian-derived population from HapMap CEU/TSI. *In Hapmap samples, Accession numbers beginning with NA refer to genomic DNA samples, while GM accessions refer to cell lines. ** 1 = Inverted homozygous, 2 = Heterozygotes, 3 = Non-inverted homozygous Red color: overlap samples that were in concordance with previous published FISH assay . Additional confirmation: (a = ; b = ; c = ). Yellow color: discordant call (called as Inverted in ).
Conceived and designed the experiments: BN IH. Performed the experiments: BN YN. Analyzed the data: BN YN. Contributed reagents/materials/analysis tools: JBH KK BN JG PG IC BC LK. Wrote the paper: BN BC IH JBH.
- 1. Giglio S, Broman KW, Matsumoto N, Calvari V, Gimelli G, et al. (2001) Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosome rearrangements. Am J Hum Genet 68:874–883.
- 2. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39:D38–D51.
- 3. Broman K, Matsumoto N, Giglio S, Martin C, Roseberry J, et al. (2003) Common long human inversion polymorphism on chromosome 8p. In Science and statistics: A festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes Monograph Series (ed. D.R. Goldstein), pp. 237–245. Institute of Mathematical Statistics, Bethesda.
- 4. Sugawara H, Harada N, Ida T, Ishida T, Ledbetter DH, et al. (2003) Complex low-copy repeats associated with a common polymorphic inversion at human chromosome 8p23. Genomics 82:238–244.
- 5. Antonacci F, Kidd JM, Marques-Bonet T, Ventura M, Siswara P, et al. (2009) Characterization of six human disease-associated inversion polymorphisms. Hum Mol Genet 18:2555–2566.
- 6. Ma J, Amos CI (2012) Investigation of inversion polymorphisms in the human genome using principal components analysis. PLoS One. 7:e40224.
- 7. Jaarola M, Martin R, Ashley T (1998) Direct evidence for suppression of recombination within two pericentric inversions in humans: a new sperm-fish technique. Am J Hum Genet 63:218–224.
- 8. Navarro A, Gazave E (2005) Inversions with classical style and trendy lines. Nat Genet 37:115–116.
- 9. Hoffmann A, Rieseberg L (2008) Revisiting the impact of inversions in evolution: From population genetic markers to drivers of adaptive shifts and speciation? Annu Rev Ecol Evol Syst 39:21–42.
- 10. Kirkpatrick M (2010) How and why chromosome inversions evolve. PLoS Biology 8:e1000501.
- 11. Salm MP, Horswell SD, Hutchison CE, Speedy HE, Yang X, et al. (2012) The origin, global distribution, and functional impact of the human 8p23 inversion polymorphism. Genome Res. 22:1144–53.
- 12. Lessard CJ, Li H, Adrianto I, Ice JA, Rasmussen A, et al. (2013) Variants at multiple loci implicated in both innate and adaptive immune responses are associated with Sjögren's syndrome. Nat. Genet. 45:1284–1292.
- 13. Tsuchiya N, Ito I, Kawasaki A (2010) Association of IRF5, STAT4 and BLK with systemic lupus erythematosus and other rheumatic diseases. Nihon Rinsho Meneki Gakkai Kaishi 33:57–65.
- 14. Zhang Z, Zhu KJ, Xu Q, Zhang XJ, Sun LD, et al. (2010) The association of the BLK gene with SLE was replicated in Chinese Han. Arch. Dermatol. Res. 302:619–624.
- 15. Gourh P, Agarwal SK, Martin E, Divecha D, Rueda B, et al. (2010) Association of the C8orf13-BLK region with systemic sclerosis in North-American and European populations. J. Autoimmun. 34:155–162.
- 16. Hom G, Graham RR, Modrek B, Taylor KE, Ortmann W, et al. (2008) Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N. Engl. J. Med. 358:900–909.
- 17. Han JW, Zheng HF, Cui Y, Sun LD, Ye DQ, et al. (2009) Genomewide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nat. Genet. 41:1234–1237.
- 18. Yin H, Borghi MO, Delgado-Vega AM, Tincani A, Meroni PL, et al. (2009) Association of STAT4 and BLK, but not BANK1 or IRF5, with primary antiphospholipid syndrome. Arthritis Rheum. 60:2468–2471.
- 19. Delgado-Vega AM, Dozmorov MG, Quiros MB, Wu YY, Martınez-Garcıa, et al. (2012) Fine mapping and conditional analysis identify a new mutation in the autoimmunity susceptibility gene BLK that leads to reduced half-life of the BLK protein. Ann. Rheum. Dis. 71:1219–1226.
- 20. Namjou B, Kim-Howard X, Sun C, Adler A, Chung SA, et al. (2013) PTPN22 association in systemic lupus erythematosus (SLE) with respect to individual ancestry and clinical sub-phenotypes. PLoS One. 8:e69404.
- 21. Namjou B, Choi CB, Harley IT, Alarcón-Riquelme ME; BIOLUPUS Network , et al. (2012) Evaluation of TRAF6 in a large multiancestral lupus cohort.Arthritis Rheum. 64:1960–9.
- 22. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 38:904–9.
- 23. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81:559–75.
- 24. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–5.
- 25. Graham RR, Cotsapas C, Davies L, Hackett R, Lessard CJ, et al. (2008) Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet 40:1059–1061.
- 26. Gregersen PK, Amos CI, Lee AT, Lu Y, Remmers EF, et al. (2009) REL, encoding a member of the NF-kB family of transcription factors, is a newly defined risk locus for rheumatoid arthritis. Nat Genet 41:820–823.
- 27. Hom G, Graham RR, Modrek B, Taylor KE, Ortmann W, et al. (2008) Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N Engl J Med 358:900–909.
- 28. Deng L, Zhang Y, Kang J, Liu T, Zhao H, et al. (2008) An unusual haplotype structure on humanchromosome8p23 derived from the inversion polymorphism. Hum Mutat 29:1209–1216.
- 29. Myers AJ, Pittman AM, Zhao AS, Rohrer K, Kaleem M, et al. (2007) The MAPT H1c risk haplotype is associated with increased expression of tau and especially of 4 repeat containing transcripts. Neurobiol Dis 25:561–570.
- 30. Guthridge JM, Lu R, Sun H, Sun C, Wiley GB, et al. (2014) Two functional lupus-associated BLK promoter variants control cell-type- and developmental-stage-specific transcription. Am J Hum Genet. 94:586–98.