Skeletal muscle is a major component of the human body. Age-related loss of muscle mass and function contributes to some public health problems such as sarcopenia and osteoporosis. Skeletal muscle, mainly composed of appendicular lean mass (ALM), is a heritable trait. Copy number variation (CNV) is a common type of human genome variant which may play an important role in the etiology of many human diseases. In this study, we performed genome-wide association analyses of CNV for ALM in 2,286 Caucasian subjects. We then replicated the major findings in 1,627 Chinese subjects. Two CNVs, CNV1191 and CNV2580, were detected to be associated with ALM (p = 2.26×10−2 and 3.34×10−3, respectively). In the Chinese replication sample, the two CNVs achieved p-values of 3.26×10−2 and 0.107, respectively. CNV1191 covers a gene, GTPase of the immunity-associated protein family (GIMAP1), which is important for skeletal muscle cell survival/death in humans. CNV2580 is located in the Serine hydrolase-like protein (SERHL) gene, which plays an important role in normal peroxisome function and skeletal muscle growth in response to mechanical stimuli. In summary, our study suggested two novel CNVs and the related genes that may contribute to variation in ALM.
Citation: Ran S, Liu Y-J, Zhang L, Pei Y, Yang T-L, Hai R, et al. (2014) Genome-Wide Association Study Identified Copy Number Variants Important for Appendicular Lean Mass. PLoS ONE 9(3): e89776. https://doi.org/10.1371/journal.pone.0089776
Editor: Nicholas M. Pajewski, Wake Forest University Health Sciences, United States of America
Received: June 27, 2013; Accepted: January 25, 2014; Published: March 13, 2014
Copyright: © 2014 Ran et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The study was partially supported by startup funding from Shanghai University of Science and Technology and Shanghai Leading Academic Discipline Project (S30501). The investigators of this work were partially supported by grants from the National Institutes of Health (R01AG026564, RC2DE020756, R01AR057049, R01AR050496 and R03TW008221), a SCOR (Specialized Center of Research) grant (P50AR055081) supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) and the Office of Research on Women’s Health (ORWH), and the Edward G. Schlieder Endowment and the Franklin D. Dickson/Missouri Endowment. Lei Zhang was also supported by the National Natural Science Foundation of China project (31100902). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Loss and function impairment of skeletal muscle, especially in the elderly, are related to a number of public health problems (such as sarcopenia, osteoporosis) and increased mortality , . Whole lean body mass (LBM) is composed of skeletal muscle (∼60%), viscera, and some other connective tissues. Appendicular lean mass (ALM) is sum of skeletal muscle mass in arms and legs which is the primary portion of skeletal muscle involved in ambulation and physical activities. ALM is considered to be an ideal measure for skeletal muscle mass , , , . ALM can be measured accurately by dual energy X-ray absorptiometry (DXA).
Skeletal muscle is under strong genetic control, with heritability estimates of 30–85% for muscle strength and 50–80% for muscle mass , . Genome wide association studies have identified a number of variants that may account for variation in ALM , . However, collectively, the identified loci/genes/variants only explain a small fraction of genetic variation in ALM, and the majority of the genetic determination remains to be revealed. Traditional association studies have focused on single nucleotide polymorphisms (SNPs). Studies on other types of genetic variants, which may account for the “missing” heritability, have been relatively rare.
Recent studies have shown that copy number variation (CNV) plays an important role in human diseases, such as schizophrenia , , Parkinson’s disease , and autism . CNV is a common type of genomic variability with the size of DNA fragments ranging from one kilobase to several megabases and presents at variable copy numbers in comparison with reference genome . CNV may influence gene expression, phenotypic variation and adaptation by disrupting coding or altering gene dosage , , , . Furthermore, it may affect gene expression indirectly through position effects, predispose to deleterious genetic changes, or provide substrates for chromosome change in evolution , , , . A recent GWAS of CNVs in Chinese identified the gremlin1 gene that was associated with LBM variation . However, to date, no study has been performed to investigate whether CNVs contribute to ALM in other ethnic groups such as Caucasians.
In this study, we performed a CNV-based GWAS to identify genetic loci influencing variation in ALM in 2,286 Caucasian subjects. Follow-up replication analyses were performed in a Chinese population consists of 1,627 subjects.
Materials and Methods
The study was approved by Institutional Review Boards of Creighton University, University of Missouri-Kansas City, Hunan Normal University of China and Xi’an Jiaotong University of China. Signed informed-consent documents were obtained from all study participants before they entered the study.
The discovery sample consisted of 2,286 unrelated Caucasian subjects that were of European origin recruited in Midwestern US (Kansas City, Missouri and Omaha, Nebraska). The inclusion and exclusion criteria were described in our previous publications .
Replication sample is an independent Chinese sample containing 1,627 unrelated subjects. All subjects were recruited from the cities of Xi’an and Changsha and their neighboring areas in China.
Anthropometric measures and a structured questionnaire covering lifestyle, diet, family information, medical history, etc. were obtained for all the study subjects. ALM and fat body mass (FBM) were measured using a dual-energy X-ray absorptiometry scanner Hologic QDR 4500 W (Hologic Inc., Bedford, MA, USA), for the all study samples. ALM (kg) was calculated as the sum of lean soft tissue (nonfat, non-bone) mass in the arms and legs. Weight was measured in light indoor clothing, using a calibrated balance beam scale, and height was measured as without shoes using a calibrated stadiometer.
Genomic DNA was extracted from peripheral blood leukocytes using standard protocols. Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA), which includes 906,600 SNPs and 940,000 copy number probes, was used to genotype each subject from the discovery sample, according to the Affymetrix protocol. Briefly, approximately 250 ng of genomic DNA was digested with restriction enzyme NspI or StyI. Digested DNA was adaptor-ligated and PCR-amplified for each sample. Fragment PCR products were then labeled with biotin, denatured, and hybridized to the arrays. Arrays were then washed and stained using Phycoerythrin on Affymetrix Fluidics Station, and scanned using the GeneChip Scanner 3000 7 G to quantitate fluorescence intensities. Data management and analyses were conducted using the Genotyping Command Console Software. For sample quality control (QC), a contrast QC threshold was set at a default value of greater than 0.4. The final average contrast QC across the entire sample reached a high level of 2.76 for our Caucasian cohort and 2.62 for our Chinese cohort.
Copy Number Analysis
Common CNVs were identified using the CANARY algorithm implemented in the Birdsuite software , which utilized a previously defined copy number polymorphism (CNP, namely CNV with frequency greater than 1%) map based on HapMap samples . In total, 1,216 CNPs were genotyped for the subjects of the discovery sample and 1280 CNPs in the replication sample, respectively.
We conducted QC filtering both at the sample level and the CNV level, according to the previously reported methods .
First, for the sample level QC, we used three quality metrics reported by the Birdseye method to evaluate the initial 2,286 subjects for quality in copy number genotyping. The following procedures were adopted: 1) we removed any sample that was greater or less than three standard deviations (SD) from the average estimate of copy number, which was approximate two copies at genome-wide level; 2) we calculated the variability in copy number and SNP probe intensities with each standardized per chromosome. We removed any sample with more than three SD than these estimates on the average genome-wide level; 3) we removed any sample in which more than two chromosomes failed any of these three metrics, i.e. more than three SD in estimated copy number or excessive CNV or SNP variability for chromosome. According to above criteria, 71 subjects were discarded. The copy numbers of the remaining 2,215 subjects were successfully genotyped using the CANARY software.
Second, we conducted QC filtering at the CNV level. Out of the initially called CNVs, we excluded those with uncertain or missing copy call of >5% or with a minor variant frequency of <1%. We discarded the CNVs with allele frequency of <1%. With the above QC criteria, a total of 410 CNVs remained in the subsequent analyses for the Caucasian sample.
Association analyses of CNV were performed using a linear regression model in R package “glm” . For both the initial GWAS and subsequent replication studies, stepwise regression was performed to screen the effects of covariates on ALM variation. Age, sex, height, and FBM were significant effectors (p<0.05) and raw ALM values were adjusted for these factors. We adjusted for covariates by a 2-stage procedure where the outcomes were regressed on covariates only, and then the resulting residuals were regressed on CNVs. To correct for the effect of potential population stratification, we conducted a principal component analysis on genome-wide SNP data with EIGENSTRAT  and included the top ten principal components as covariates. Fisher’s method  was used to combine the p-values from the discovery sample and replication sample.
The basic characteristics of the subjects used in both discovery and replication samples are summarized in Table 1.
In the discovery sample, 20 CNVs showed evidence for association with ALM at a p value of 0.05 (Table 2). CNV1191 and CNV2580 were replicated in the Chinese sample. The p values of CNV1191 in the discovery and replication samples were 2.26×10−2 and 3.26×10−2, respectively, and p values of CNV2580 in the discovery and replication samples were 3.34×10−3 and 0.107, respectively (Table 2). The combined p values of the two CNVs were 6.05×10−3 and 3.27×10−3, respectively.
We further tested association between normal (CN = 2) and deletion (CN = 0, 1) groups, and between normal and duplication (CN = 3, 4) groups, separately. The results showed that while the direction of effect of CNV2580 was consistent in discovery and replication samples, it was not the case for CNV1191 (Table 3). However, both CNVs remained to be significant in the combined analyses.
In addition to the 2-step adjustment procedure for covariates aforementioned, we performed association analyses where CNVs and covariates were included in a single model. The results were quite similar to those of the 2-step procedure (Table S1).
According to the UCSC Genome Browser on Human February 2009 (GRCh37/hg19) Assembly, CNV1191 is located at the chromosome region 7q36.1 with physical position ranging from 149,916,734 bp to 149,932,502 bp, within the gene GTPase IMAP family member 1 (GIMAP1). The number of carriers with CN = 0, 1, 2, 3 and 4 was 126, 855, 1273, 28 and 4, respectively, in the discovery sample. Due to the limited number of subjects with CN = 4, we merged CN = 3 and CN = 4 into a single group. The number of carriers with CN = 0, 1, 2, 3 was 42, 465, 1093 and 27, respectively, in the replication sample. In the discovery sample, carriers with CN = 1 and CN = 2 had higher ALM (22.7 kg and 22.6 kg) and carriers with CN = 3 had the lowest ALM (21.1 kg) (Figure 1). Consistently, in the replication sample, carriers with CN = 1 and CN = 2 had higher ALM (19.8 kg) and carriers with CN = 3 had the lowest ALM (19.3 kg) (Figure 1).
CNV2580 is located at the chromosome region 22q13.2 with physical position ranging from 41,234,550 bp to 41,276,824 bp, within the gene serine hydrolase-like protein (SERHL). The number of carriers with CN = 1, 2, 3, and 4 was 11, 1763, 455, and 57, respectively, in the discovery sample, and was 5, 1257, 314 and 50, respectively, in the replication sample. Due to the limited number of subjects with CN = 1, we merged CN = 1 and CN = 2 into a single group. In the discovery sample, carriers with CN = 2 and CN = 3 had higher ALM (22.6 kg and 22.7 kg, respectively) and carrier with CN = 4 had the lowest ALM (21.7 kg) (Figure 2), with the estimated β to be −5.24×10−2 (ALM in kg) for each copy number. Consistently, in the replication sample, carriers with CN = 2 and CN = 3 had ALM of 19.8 kg and19.9 kg, respectively, and carrier with CN = 4 had ALM of 18.5 kg (Figure 2), with the estimated β to be −4.34×10−2 (ALM in kg) for each copy number.
Table 4 lists the proportion of subjects for each copy of CNV2580. The table also includes theoretical proportion calculated based on empirical CN frequencies and random mating assumption. Goodness-of-fit (GOF) test showed that empirical distribution did not deviate from the theoretical distribution (p = 0.22 for both populations).
There are two SNPs that are located in the region of CNV1191 and eight SNPs outside the CNV1191 boundaries but inside the gene of GIMAP1. None of these ten SNPs was significantly associated with ALM in the discovery sample, but rs11769150 was associated with ALM in the replication sample with p-value of 0.02 (Table 5).
There are four SNPs that are located in the region of CNV2580 and fifteen SNPs outside the CNV2580 boundaries but inside the gene of SERHL. None of these nineteen SNPs was significantly associated with ALM in the discovery sample, but two SNPs rs139116 and rs139120 were associated with ALM in the replication sample with p-values of 0.02 (Table 5).
This is the first CNV–based GWAS for ALM in Caucasians. Two CNVs, CNV1191 and CNV2580, were identified to be associated with ALM.
CNV1191 is located in the gene GIMAP1, which encodes GTPase, IMAP family member 1. GIMAP (GTPase of the immunity–associated protein family) proteins are a family of putative GTPases believed to be regulators of cell death in lymphomyeloid cells. GIMAP1 was the first reported member of this gene family . This gene was involved in the differentiation of T helper (Th) cells of the Th1 lineage, and the related mouse gene has been shown to be critical for the development of the mature B and T lymphocytes .
Culturing myotubes from skeletal muscle-biopsies found coordinated reduced expression of five members of the GIMAP family GIMAP1, GIMAP4, GIMAP5, GIMAP6 and GIMAP7, which form a cluster on chromosome 7 and participate in SM cell survival/death . A study in pig skeletal muscle indicated that GIMAP1 was correlated with meat quality and regulation of biological processes involved in the induction of apoptosis . This gene was also involved in regulation of lipid catabolic process, defense response and positive regulation of calcium ion transport . Our findings, combined with the above evidence, support the potential contribution of GIMAP1 to variation in skeletal muscle.
SERHL is a gene coding for a new member of the family of serine hydrolases that is located within peroxisomes . In vivo studies showed that mRNA expression of SERHL increased in response to passive stretch imposed upon skeletal muscle .
The association directions of CNV1191 in the discovery and replication studies were different. This inconsistency may be explained by the following reasons. First, genetic variants may have different effects in different populations. A genetic variant may have different allele frequencies among diverse populations because of different evolution histories, which result in different modes of genotype-phenotype association . Second, significant associations are usually found at molecular markers that are in linkage disequilibrium (LD) with causal variant, rather than the causal variant itself. Therefore, the inconsistency in direction could be a result of opposite patterns of LD between the two populations.
Within the two CNVs regions, we did not identify any significant SNPs that were associated with ALM in the discovery sample. A possible explanation is that, different from SNP, CNV is a structural genetic variant that generally covers a larger genomic region and thus CNV may influence phenotypic variation by mechanisms that are different from SNP.
In summary, we identified CNV1191 and CNV2580 that were associated with ALM. The relevant genes, GIMAP1 and SERHL, may play roles in skeletal muscle metabolism. Our findings may provide useful information for molecular functional studies of candidate genes for ALM.
Conceived and designed the experiments: HWD YJL. Performed the experiments: SR TLY RH YYH. Analyzed the data: YP LZ YL QT. Wrote the paper: SR YJL.
- 1. Sipila S, Heikkinen E, Cheng S, Suominen H, Saari P, et al. (2006) Endogenous hormones, muscle strength, and risk of fall-related fractures in older women. J. Gerontol. A Biol. Sci. Med. Sci. 61: 92–96.
- 2. Karakelides H, Nair KS (2005) Sarcopenia of aging and its metabolic impact. Curr. Top Dev. Biol. 68: 123–148.
- 3. Baumgartner RN (2000) Body composition in healthy aging. Ann.N Y.Acad.Sci. 904: 437–48.
- 4. Gallagher D, Visser M, De Meersman RE, Sepulveda D, Baumgartner RN, et al. (1997) Appendicular skeletal muscle mass: effects of age, gender, and ethnicity. J Appl. Physiol. 83: 229–239.
- 5. Newman AB, Kupelian V, Visser M Simonsick E, Goodpaster B, et al. (2003) Sarcopenia: alternative definitions and associations with lower extremity function. J. Am. Geriatr. Soc. 51: 1602–1609.
- 6. Delmonico MJ, Harris TB, Lee JS, Visser M, Nevitt M, et al. (2007) Alternative definitions of sarcopenia, lower extremity performance, and functional impairment with aging in older men and women. J. Am. Geriatr. Soc. 55: 769–74.
- 7. Thomis MA, Huygens W, Heuninckx S, Chagnon M, Maes HHM, et al. (2004) Exploration of myostatin polymorphisms and the angiotensin-converting enzyme insertion/deletion genotype in responses of human muscle to strength training. Eur J Appl Physiol 92: 267–274.
- 8. Arden NK, Spector TD (1997) Genetic influences on muscle strength, lean body mass, and bone mineral density: a twin study. J Bone Miner Res, 12, 2076–2081.
- 9. Sun L, Tan LJ, Lei SF, Chen XD, Li X, et al. (2011) Bivariate genome-wide association analyses of femoral neck bone geometry and appendicular lean mass. PLoS One 6: e27325.
- 10. Han YY, Pei YF, Liu YZ, Zhang L, Wu SY, et al. (2012) Bivariate genome-wide association study suggests fatty acid desaturase genes and cadherin DCHS2 for variation of both compressive strength index and appendicular lean mass in males. Bone 51: 1000–1007.
- 11. Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, et al. (2008) Rare structural variants disrupt multiple genes in neuro developmental pathways in schizophrenia. Science 320: 539–543.
- 12. The International Schizophrenia Consortium (2008) Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455: 237–241.
- 13. Ibanez P, Bonnet AM, Debarges B, Lohmann E, Tison F, et al. (2004) Causal relation between alpha-synuclein gene duplication and familial Parkinson’s disease. Lancet 364: 1169–1171.
- 14. Glessner JT, Wang K, Cai G, Korvatska O, Kim CE, et al. (2009) Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459: 569–573.
- 15. Feuk L, Carson AR, Scherer SW (2006) Structural variation in the human genome. Nat Rev Genet 7: 85–97.
- 16. Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, et al. (2006) High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nat Genet 38: 463–467.
- 17. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, et al. (2006) Common deletion polymorphisms in the human genome. Nat Genet 38: 86–92.
- 18. Buckland PR (2003) Polymorphically duplicated genes: their relevance to phenotypic variation in humans. Ann Med 35: 308–315.
- 19. Nguyen DQ, Webber C, Ponting CP (2006) Bias of selection on human copy number variants. PLoS Genet 2: e20.
- 20. Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, et al. (2006) Copy number variation: new insights in genome diversity. Genome Res 16: 949–961.
- 21. Lupski JR, Stankiewicz P (2005) Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes. PloS Genet 1: e49.
- 22. Feuk L, Marshall CR, Wintle RF, Scherer SW (2006) Structural variants: changing the landscape of chromosomes and design of disease studies. Hum Mol Genet 15 Spec No 1: R57–66.
- 23. Hai R, Pei YF, Shen H, Zhang L, Liu XG, et al. (2011) Genome-wide association study of copy number variation identified gremlin1 as a candidate gene for lean body mass. J Hum Genet 57: 33–37.
- 24. Deng HW, Deng HY, Liu YJ, Liu YZ, Xu FH, et al. (2002) A genomewide linkage scan for quantitative-trait loci for obesity phenotypes. Am J Hum Genet. 70(5): 1138–1151.
- 25. Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, et al. (2008) Integrated enotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40: 1253–1260.
- 26. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, et al. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40: 1166–1174.
- 27. Kathiresan S, Voight BF, Purcell S, Musunuru K, Ardissino D, et al. (2009) Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 41: 334–341.
- 28. R Development Core Team (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available: http://www.R-project.org/.
- 29. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909.
- 30. Fisher KA, Miles R (2008) Modeling the acoustic radiation force in microfluidic chambers. J Acoust Soc Am 123: 1862–1865.
- 31. Saunders A, Lamb T, Pascall J, Hutchings A, Dion C, et al. (2009) Expression of GIMAP1, a GTPase of the immunity-associated protein family, is not upregulated in malaria. Malar J. 8: 53.
- 32. Schwefel D, Daumke O (2011) GTP-dependent scaffold formation in the GTPase of Immunity Associated Protein family. Small GTPases 2 1: 27–30.
- 33. Raymond F, Metairon S, Kussmann M, Colomer J, Nascimento A, et al. (2010) Comparative gene expression profiling between human cultured myotubes and skeletal muscle tissue. BMC Genomics 11: 125.
- 34. Liaubet L, Lobjois V, Faraut T, Tircazes A, Benne F, et al. (2011) Genetic variability of transcript abundance in pig peri-mortem skeletal muscle: eQTL localized genes involved in stress response, cell death, muscle disorders and metabolism. BMC Genomics 12: 548.
- 35. Ravnik-Glavač M, Hrašovec S, Bon J, Dreu J, Glavač D (2012) Genome-wide expression changes in a higher state of consciousness. Consciousness and Cognition 21(3): 1322–1344.
- 36. Sadusky TJ, Kemp TJ, Simon M, Carey N, Coulton GR (2001) Identification of Serhl, a new member of the serine hydrolase family induced by passive stretch of skeletal muscle in vivo. Genomics 73: 38–49.
- 37. Economou M, Trikalinos TA, Loizou KT, Tsianos EV, Ioannidis JP (2004) Differential effects of NOD2 variants on Crohn’s disease risk and phenotype in diverse populations: a metaanalysis. Am J Gastroenterol 99: 2393–2404.