Common and rare exonic MUC5B variants associated with type 2 diabetes in Han Chinese

Genome-wide association studies have identified over one hundred common genetic risk variants associated with type 2 diabetes (T2D). However, most of the heritability of T2D has not been accounted for. In this study, we investigated the contribution of rare and common variants to T2D susceptibility by analyzing exome array data in 1,908 Han Chinese genotyped with Affymetrix Axiom® Exome Genotyping Arrays. Based on the joint common and rare variants analysis of 57,704 autosomal SNPs within 12,244 genes using Sequence Kernel Association Tests (SKAT), we identified significant associations between T2D and 25 variants (9 rare and 16 common) in MUC5B, p-value 1.01×10−14. This finding was replicated (p = 0.0463) in an independent sample that included 10,401 unrelated individuals. Sixty-six of 1,553 possible haplotypes based on 25 SNPs within MUC5B showed significant association with T2D (Bonferroni corrected p values < 3.2×10−5). The expression level of MUC5B is significantly higher in pancreatic tissues of persons with T2D compared to those without T2D (p-value = 5×10−5). Our findings suggest that dysregulated MUC5B expression may be involved in the pathogenesis of T2D. As a strong candidate gene for T2D, MUC5B may play an important role in the mechanisms underlying T2D etiology and its complications.


Introduction
Type 2 Diabetes (T2D) is a growing global health problem. Currently, about 415 million people worldwide have diabetes. By 2040, the number of people living with diabetes is expected to increase to 642 million, with two-thirds of all cases occurring in low to middle-income countries [1]. In China, the prevalence of T2D increased exponentially over the past three decades. In 1980, the prevalence of T2D in China was less than 1%; this estimate increased to about 12% in 2010 [2]. By 2013, there were about 114 million people with diabetes and about 500 million people with prediabetes in China. This rapid increase, which is unlike the transition that PLOS ONE | https://doi.org/10.1371/journal.pone.0173784 March 27, 2017 1 / 17 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 occurred in Western countries, coincided with economic growth, urbanization, changes in lifestyle and demographic characteristics in China.
In addition to the well-recognized influence of lifestyle factors on the risk of T2D, genetic factors play a major role in susceptibility to T2D. The successful application of genome wide association studies (GWAS) has provided some insight into the genetic basis of T2D. Until recently, it was generally assumed that common diseases such as T2D were caused by common variants [3]. Given that GWAS provided genotypic information on common variants, it appeared to be the ideal technique to identify variants. To date, over 100 common genetic risk variants with small effect sizes have been identified from GWAS and shown to be associated with T2D. However, the joint effects of these variants accounts for less than 10% of the heritability for T2D [4]. In this study, we examined the association of rare variants with T2D among a population of unrelated Chinese adults. Given that susceptibility to T2D likely involves the contribution of both common and rare variants, we conducted joint analysis of common and rare variants of about 58,000 autosomal SNPs.

Study population
The China America Diabetes Mellitus (CADM) study is a large-scale genetic epidemiology study designed to investigate the genetic and environmental determinants of metabolic disorders including T2D, dyslipidemia, kidney disease, and hypertension. In CADM,~2000 unrelated participants with written informed consent were enrolled from Suizhou, China, of whom 1908 were genotyped and included in these analyses. Suizhou, a historic city, is located in the Hubei province, central China and has a population of over 2 million, most of whom are Han Chinese (99.2%). Ethical approval for the study was obtained from the Institutional Review Boards of Howard University, the National Institutes of Health, and IRB of Suizhou Central Hospital, Suizhou, China. All enrolled participants provided written informed consent during the clinical visit before commencement of data collection by interview and collection of biospeciments. Details of the study protocol were clearly explained to each participants and potential participants had the opportunity to ask questions before signing the consent documents.

Phenotype definitions
During a clinic examination, interviewers collected demographic information from the participants. All enrolled individuals self-identified as Han Chinese. Weight was measured in light clothes on an electronic scale to the nearest 0.1 kg, and height was measured with a stadiometer to the nearest 0.1 cm. Body mass index (BMI) was computed as weight (kg) divided by the square of height (m 2 ). Blood samples were obtained from all participants after an overnight fast. T2D diagnosis was based on any of the following criteria established by the American Diabetes Association Expert Committee: fasting plasma glucose concentration ! 126 mg/dl (7.0 mmol/l), 2-hour post load value in the oral glucose tolerance test ! 200 mg/dl (11.1 mmol/l) on more than one occasion, history of T2D or on prescribed medication for diabetes. Cases were defined as individuals diagnosed with T2D, while controls were individuals without T2D. Hypertension was defined as systolic blood pressure (SBP) ! 140 mmHg and/or diastolic blood pressure (DBP) ! 90 mmHg, or use of blood pressure medication.
DNA sample preparation, genotyping, and quality control DNA was extracted from buffy coat samples using a chemagenic DNA Isolation Kit (PerkinElmer Chemagen Technologie Gmb, Baesweiler, Germany) following the manufacturer's instructions. Samples were genotyped using Affymetrix Axiom1 Exome Genotyping Arrays. This array is primarily designed to detect coding variation and contains over 300,000 markers, including non-synonymous and synonymous SNPs as well as variants in splice and stop codons, and 30,000 single-base and complex indels. Genotypes were called using Axiom GT1 algorithm as implemented in Affymetrix genotyping console 4.1.3, which is a new genotyping procedure developed specifically for use with Affymetrix Axiom 1 Genome-wide human arrays. All arrays passed plate quality control following the manufacturer's recommendations. The genotyping concordance rate (evaluated using 16 SNPs that were blind-genotyped twice) was 99.64%. The concordance rate for 10 individuals that were typed twice on the entire array was 98.64%. Of the 290,890 markers on the array, 178,943 were monomorphic, 23,756 had genotyping call rate less than 0.95 and 1,458 markers failed HWE (p value < 10 −6 ). Of the remaining 86,733 markers, 85,009 were autosomal; 27,305 of the autosomal markers were removed for having minor allele counts less than 5. In all, 57,704 autosomal markers were carried forward for analysis in this study. Of these markers, 12,329 (21.37%) had a minor allele frequency (MAF) < 0.01, and 45,375 (78.63%) had MAF ! 0.01 (with 32,638 with MAF ! 0.05). A variant was classified as "common" if MAF > 1 ffiffiffi 2n p and "rare" if MAF 1 ffiffiffi 2n p (n is number of individuals) = 0.0162 [5]. Based on hg19 genome build 37 (GRCh37), the 57,704 markers were located within 12,244 gene regions.

Statistical analysis
To minimize the potential effect of population structure, we adjusted all analyses by the first two principal components (PC1 and PC2) obtained from R package, SNPRelate [6], which generates genetic covariance matrix followed by the extraction of eigenvalues and eigenvectors for the calculation of PCs. Single marker analysis for Common SNPs was implemented in PLINK [7] under a genetic additive model, adjusting for sex, age, BMI, Hypertension, and first two PCs. A permutation procedure was used to generate significance levels empirically to deal with rare alleles and small sample size [7]. Simple label swapping of phenotype (T2D) was used for 100,000,000 permutation tests. The empirical permutation p value (Emp) was pointwise and was calculated by Emp = Eþ1 Nþ1 , where E is number of statistic values ! observed statistic value, and N is the total number of permutation.
Gene-base analyses of rare variants only and of joint common and rare variants were conducted using Sequence Kernel Association Test (SKAT) [5], with models adjusted as in the common single marker analysis. The overall joint effect of rare and common variants by gene regions was tested by combining the test statistics directly using weighted-sum statistics, Q ;,p1,p2 = (1 − ;)Q rare + ; Q common with ; ¼ SD½Q rare SD½Q rare þSD½Q common , given (;,p 1 ,p 2 ). As rare variants are assumed to have larger effect sizes., different weight functions were used for rare and common variants as follows: βeta(MAF,α = 1,β = 25) for rare, and βeta(MAF,α = 0.5,β = 0.5) for common variants. Under null, the distribution of Q ;,p1,p2 is a mixture of x 2 1 distributions. These x 2 1 distributions are independent and identically distributed chi-square random variables with 1-degree freedom. An asymptotic p value was then computed with Davies' method or moment matching [5]. The genome-wide and suggestive significant threshold were established as α of 2.5 × 10 −6 and α of 2.5 × 10 −5 respectively [8].

Haplotype phasing and analysis
Haplotype phasing of SNPs was performed with the BEAGLE program [9], which uses the hidden Markov model (HMM) to find the most likely haplotype pair for each individual, conditional on that individual's genotypes. Haplotype phasing was conducted on the set of 25 SNPs T2D-associated MUC5B SNPs. Haplotypes were tested in a logistic regression model that included age, sex, BMI, hypertension status, and first two PCs as covariates. A total of 1,553 possible haplotypes across MUC5B were tested. Bonferroni correction was used to adjust for multiple tests (0.05/number of possible haplotype = 3.2×10 −5 ).

Replication analysis
Replication analysis was performed in 10,401 African ancestry samples obtained from the Atherosclerosis Risk in Communities (ARIC, n = 3,137) [10], the Cleveland Family Study (CFS, n = 653) [11], the Howard University Family Study (HUFS, n = 1,976) [12], Jackson Heart Study (JHS, n = 2,187) [13], Multi-Ethnic Study of Atherosclerosis (MESA, n = 1,611) [14], and Africa America Diabetes Mellitus Study (AADM, n = 1802) [15]. Analysis was conducted using human genomic reference (hg19) coordinates. LiftOver (https://genome.ucsc.edu/cgibin/hgLiftOver) was used to convert genome coordinates and genome annotation between assemblies. Rare and common variants were defined as in the discovery study. The set of 25 rare and common SNPs associated with T2D in the discovery study were extracted from replication datasets. Sixteen (13 common and 3 rare variants) of 25 SNPs were available for joint common and rare variants analysis in SKAT. As in the discovery analysis, sex, age, BMI, first two principal components (PC1, and PC2) were included as covariates.
Replication of published GWAS findings was attempted using two strategies, 1) exact and local (i.e., SNPs in Linkage disequilibrium [LD] with the reported SNP [16]) for those gene regions that contained only common variants; and 2) a gene-level approach for gene regions containing both common and rare variants using SKAT. HapMap CHB reference data for Chinese ancestry populations was used for the identification of markers in LD with published variants. To adequately account for multiple testing, we estimated the effective degrees of freedom (df) for the spectrally-decomposed covariance matrix for the block of markers using this study's (CADM) genotype data as previously described [17].

Microarray analysis of human islets
Data was extracted from publicly-available MIAME compliant gene expression data (GEO, accession number GSE25724; GDS3882; http://www.ncbi.nlm.nih.gov), using the R package, GEOquery. The original data was generated from the analysis of islets of Langerhans isolated from T2D and non-T2D organ donors [18]. RNA was biotinylated, fragmented, and hybridized onto Affymetrix Human Genome U133A Array chips. The expression data was scanned and log 2 normalized, and the differential gene expression between T2D and non-T2D samples was assessed. Two-tailed tests were used, and p values lower than 0.01 were considered as differentially-expressed [18].

Results
Characteristics of study participants are displayed in Table 1. In this case-controls study of 1,908 individuals, about 50% of the cases and controls were female. Cases were older, heavier and, as expected, had significantly higher mean fasting blood glucose levels. Also, the cases had higher mean systolic and diastolic blood pressure and higher prevalence of hypertension compared to the controls (63.7% vs 39.56%, respectively).
In the joint common and rare variant analysis (12,244 genes), we observed a significant association between T2D and variants in the MUC5B gene (mucin 5B, oligomeric mucus/gelforming, GeneID: 727897, 11p15.5) with p-value of 1.01 × 10 −14 ( Table 2; Fig 1 and QQ plot S1  Fig). This analysis included nine rare and sixteen common variants in MUC5B (Table 2). Adjustment for smoking strengthened the association (p-value = 6.29 × 10 −15 ). Replication analysis was conducted in 10,401 African ancestry individuals (S1 Table) using 16 available SNPs (3 rare and 13 common) of the 25 SNPs in the MUC5B gene (S2 Table). The MUC5B finding replicated in this large sample of individuals (p = 0.0463). In CADM, the frequency of the T allele in one of the rare variants (rs12282798, MAF = 0.0047) was 0.011 among cases and < 0.001 among controls with an empirical p-value of 1.85 × 10 −4 (Table 3). Four common SNPs (rs201894106 allele T, rs199967813 allele A, rs192744525 allele A, and rs199285958 allele C) with allele frequencies < 0.01 in cases, and > 0.04 in controls, statistically significant difference (empirical p-value of 10 −8 ). The complete list of allele counts within MUC5B by T2D status and associated p-values obtained from permutation (n = 10 8 ) tests are presented in Table 3.

Discussion
We identified both rare and common variants within the MUC5B gene that were associated with T2D in this study conducted among Han Chinese. These results were replicated in a large sample of over 10,000 African ancestry individuals. We also identified several haplotypes within MUC5B that showed significant associations with T2D. Notably, individuals with T2D had significantly higher expression levels of MUC5B compared to those without T2D.
MUC5B encodes a member of the mucin family of proteins. These proteins are highly glycosylated macromolecular components [43]. As indicated above, the expression of MUC5B is increased among individuals with T2D compared to controls; however, the underlying   mechanistic explanation driving the increased expression among diabetics has not been elucidated. Published studies suggest that the expression of MUC5B may be mediated through insulin-like growth factor-1 (IGF-1) and p38 mitogen-activated protein kinases (MAPK). MUC5B mRNA expression is induced by the action of IGF-1 [44]. It has been reported that individuals with T2D, obesity, or both have increased levels of IGF-1 [45][46][47] and that IGF-1 induced MUC5B expression is regulated by activation of p38 MAPK [44]. High levels of glucose have been shown to activate p38 MAPK signaling pathway in pancreatic β cells [48][49][50]. In animal studies, p38 has been shown to play an important role in diabetes-induced inflammation [51]. The lung is a target organ for T2D. Abnormal pulmonary function has been observed in individuals with T2D, the most consistent abnormalities include poor lung elasticity, reduced diffusion capacity due to impaired capillary blood volume, reduced absolute thoracic gas volumes, reduced lung volume and airflow resistance [52][53][54]. T2D may lead to abnormal pulmonary function through non-enzymatic glycosylation-induced alteration of the chest wall and bronchial tree collagen protein, which induces fibrous tissue formation, thickening of the basal lamina, increased protein catabolism, neuropathy of the phrenic nerve and diaphragmatic paralysis [54][55][56][57]. In healthy lungs, MUC5B is expressed in the goblet cells of bronchi and bronchioles. It has been found to be up-regulated in some human pulmonary diseases [58]. In a study of individuals with lung disease, a genome-wide linkage scan showed that a common promoter of MUC5B was associated with familial interstitial pneumonia and idiopathic pulmonary fibrosis; MUC5B was highly expressed among diseased individuals, compared to controls [59]. A recent meta-analysis that included Asian populations showed a strong association between MUC5B (rs35705950 polymorphism) and risk of idiopathic pulmonary fibrosis [60]. The diabetes status of the individuals in the study was not stated. The MUC5B gene is composed of tandem repeats which are flanked by cysteine-rich subdomains (845 residues upstream and 700 residues downstream). The cysteine-rich subdomains were similar to the D-domains of human pro-Von Willebrand factor [61,62]. Increased levels of von Willebrand factor, an indication of damage to endothelial cells, have been showed association with diabetes [63]. It also reported as a predictive markers for diabetic nephropathy and neuropathy, thus providing a clue that endothelial dysfunction precedes the onset of diabetic microangiopathy [63]. In previous studies of Sjögren's syndrome, a chronic autoimmune disease in which the body's white blood cells destroy the exocrine glands, a relationship between MUC5B, von Willebrand factor and diabetes was suggested [64,65], indicating a potential role of MUC5B in cardiovascular complications of T2D. An NF-kappa-B binding site in the MUC5B promoter showed that activation of the NF-kappa-B signaling pathway upregulated MUC5B mRNA expression 2 fold [66]. NF-kappa-B signaling pathway plays an important role in immune and inflammatory response [67], supporting a potential role of MUC5B in T2D.

Conclusions
We identified rare and common variants in the MUC5B gene that are associated with T2D in Han Chinese. Our findings suggest that dysregulated MUC5B expression may be involved in the pathogenesis of T2D. As a strong candidate gene for T2D, MUC5B may play an important role in the mechanisms underlying T2D etiology and its complications.
Supporting information S1 Fig. QQ plots exome array association results. The y axis represents observed -log e (p values), and the x axis is expected -log e (p values). (TIFF) S1 Table. Basic characteristics of the study participants by type 2 diabetes status in African ancestry replication study.