Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Global Carrier Rates of Rare Inherited Disorders Using Population Exome Sequences

Global Carrier Rates of Rare Inherited Disorders Using Population Exome Sequences

  • Kohei Fujikura


Exome sequencing has revealed the causative mutations behind numerous rare, inherited disorders, but it is challenging to find reliable epidemiological values for rare disorders. Here, I provide a genetic epidemiology method to identify the causative mutations behind rare, inherited disorders using two population exome sequences (1000 Genomes and NHLBI). I created global maps of carrier rate distribution for 18 recessive disorders in 16 diverse ethnic populations. Out of a total of 161 mutations associated with 18 recessive disorders, I detected 24 mutations in either or both exome studies. The genetic mapping revealed strong international spatial heterogeneities in the carrier patterns of the inherited disorders. I next validated this methodology by statistically evaluating the carrier rate of one well-understood disorder, sickle cell anemia (SCA). The population exome-based epidemiology of SCA [African (allele frequency (AF) = 0.0454, N = 2447), Asian (AF = 0, N = 286), European (AF = 0.000214, N = 4677), and Hispanic (AF = 0.0111, N = 362)] was not significantly different from that obtained from a clinical prevalence survey. A pair-wise proportion test revealed no significant differences between the two exome projects in terms of AF (46/48 cases; P > 0.05). I conclude that population exome-based carrier rates can form the foundation for a prospectively maintained database of use to clinical geneticists. Similar modeling methods can be applied to many inherited disorders.


Recent advances in next-generation sequencing (NGS) technology have revolutionized the field of clinical genetics [14]. This technology has facilitated the identification of the novel causative genes for >3,000 inherited disorders, which are currently annotated in the Online Mendelian Inheritance in Man (OMIM) [2, 3]. Most of these disorders are referred to as rare or orphan diseases because of their low incidence [5]. In clinical practice, molecular genetic testing is already being applied to screen for these inherited disorders [6]. However, the epidemiological information of many inherited disorders is completely insufficient and inconclusive. Particularly for rare diseases, epidemiology is a research field that remains largely unexplored by clinical geneticists and researchers [7]. Total global prevalence of all monogenic disorders at birth has been calculated to be several percent [5]. In Canada, it has been estimated that single-gene disorders may account for approximately 40 percent of cases in pediatric practice [8]. Therefore, the public health impact of Mendelian diseases is a topic of growing interest worldwide. Reliable estimates of the populations affected by inherited diseases have become increasingly important to guide efficient allocation of public health resources in each country, region, and city [7, 9, 10].

The lack of epidemiologic studies of inherited disorders is particularly true for developing countries with limited resources [1113]. Most epidemiologic researches have been conducted with individuals from Europe and North America, who represent only a fraction of the global population [11, 12]. In developing countries, consultation rates, data collection methods, and population-based registries for inherited disorders vary considerably by urbanization grade and ambient environment [1113].

To overcome these limitations I analyzed the global carrier rates of rare inherited disorders using geographical population exomes. The global map of the carrier rates showed strong population-specificity and this prediction represented equivalent accuracy that may be achievable with clinical practice. This is an initial global overview of the carrier rate of genetic disorders using population exome sequences.


Strategy for epidemiological research on Mendelian disorders using population exome sequences

As an initial study toward determining the genetic epidemiology of inherited disorders, genetic pipelines from 1000 Genomes (1000G) [14, 15] and National Heart, Lung, and Blood Institute (NHLBI) projects [16, 17] were collected for variations with the potential to affect protein integrity (Fig 1). The dataset included the exome and its surrounding intronic sequences for 1,092 individuals (525 males, 567 females) of 14 ethnic origins and 6,503 individuals (2,443 males and 4,060 females) of two ethnic origins. Population demographics are summarized in S1 Table. Caucasians comprised 34.7% and 66.1% of subjects from the 1000G and NHLBI groups, respectively. Asian and Hispanic populations, which were represented only in the 1000G, constituted 26.2% and 16.6% of the group, respectively. A total of 65.9% were female. Many samples were from within the United States; a minority were from China, Japan, Colombia, Mexico, Puerto Rico, Finland, England, Spain, Germany, Italia, Nigeria and Kenya. These populations under the study are likely depleted for individuals with rare genetic disorders, but when the prevalence rates are so close to 0 (<0.25%) under Hardy-Weinberg equilibrium the carrier rate is usually approximated as follows: (1) where p and q indicates allele frequencies and p + q = 1 (p<0.05; q>0.95).

Fig 1. Strategy for epidemiological research on Mendelian disorder using exome sequences.

A flow chart used to study the geographic prevalence shows the process of mutation detection using 1000G and NHLBI datasets. A total of 15,190 haploid exomes were screened for 161 causative mutations linked to 18 genetic disorders. Several platforms (NCBI dbSNP and UCSC Browser) were used to access the validity of mutations and examine previous information on gene annotations and alleles.

Disease panels used in this initial study were as follows: Sickle cell anemia (SCA; OMIM #603903); Primary immunodeficiency (Mucocutaneous fungal infection) (#613108); Pituitary hormone deficiency, combined 2 (CPHD2; #262600); Canavan disease (#271900); Pustular psoriasis (No description in OMIM); Rod-cone dystrophy (RCD; #615780); Primary autosomal recessive microcephaly 1 (MCPH1; #251200); Seckel syndrome 5 (SCKL5; #613823); Pontocerebellar hypoplasia type 1B (PCH1B; #614678); Miller syndrome (#263750); Facial dysmorphism, lens dislocation, anterior-segment abnormalities, and spontaneous filtering blebs (FDLAB, or Traboulsi syndrome; #601552); Carpenter syndrome 1 (CRPT1; #201000); Glucocorticoid deficiency 4 (GCCD4; #614736); Childhood-onset dilated cardiomyopathy (#615916); Usher syndrome type 1J (USH1J; #614869); Aicardi-Goutières syndrome 6 (AGS6; #615010); 3-methylglutaconic aciduria with deafness, encephalopathy, and Leigh-like syndrome (MEGDEL syndrome; #614739); and Severe dermatitis, multiple allergies and metabolic wasting syndrome (SAM syndrome; #615508) (S2 Table). The list of mutations was manually collated from all literature sources published over a wide period (from 1957 to 2014) (See S2 Table). In addition, several gene/disease annotation systems, including NCBI Entrez and OMIM, were used to identify disease-causing mutations (Fig 1). I identified a total of 161 mutations associated with 18 recessive diseases (S2 Table), of which 24 mutations were detected in both or either of the two exome datasets (Table 1). 15 genetic diseases were detected in a total of 7,595 individuals while three disorders, childhood-onset dilated cardiomyopathy, USH1J, and SAM syndrome, were not (Fig 1). Causative alleles were classified by mutation type, carrier rate, racial group, and clinical impact (Fig 1 and Table 1).

Table 1. Estimated carrier rates of 15 Mendelian disorders by race, ethnicity, and country.

The information about the mutation and carrier rate is shown in this figure. Pustular psoriasis caused by is yet described in OMIM. The abbreviations are as follows: AA, African Americans; EA, European Americans; ASW, American’s of African Ancestry in SW; CEU, Utah Residents (CEPH) with Northern and Western European ancestry; CHB, Han Chinese in Beijing; CHS, Southern Han Chinese; CLM, Colombian from Medellin; FIN, Finnish in Finland; GBR, British in England; IBS, Iberian population in Spain; JPT, Japanese in Tokyo; LWK, Luhya in Webuye; MXL, Mexican ancestry from Los Angeles; PUR, Puerto Rico from Puerto Rica; TSI, Toscani in Italia; YRI, Yoruba in Ibadan.

Disease carrier states of Mendelian disorders

As expected, among 15 genetic diseases detected, the most common was SCA, with a frequency of 1 in 66.6 (1.50%) (Table 1). In contrast, MCPH1 was the rarest disorder, with a frequency of 1 in 14,160 (0.0071%). In addition, carrier prediction unexpectedly revealed high carrier rates (1 in 254.0) for CEP152 mutations for SCKL5. Carrier statistics are fully reported in Table 1.

Carrier rate variability by race and ethnicity

Carrier frequencies for disease-causing mutations varied significantly by racial and ethnic groups although the sample size is not so large in Hispanics and Asians [7, 9, 10]. Fig 2 shows the global map of carrier distribution of eight causative mutations for three Mendelian disorders. For example, an average of 0.11% of individuals were carriers for Miller syndrome, but the frequency ranged from 0.18% of European individuals to 0% of Africans, Asians, and Hispanics (Fig 2C). For ethnic groups such as European, this higher frequency was unreported before and thus suggest that the European population is right target for screening for Miller syndrome. Among 15,190 haploid exomes, causative alleles for seven disorders (SCA, SCKL5, Primary immunodeficiency, Canavan disease, Pustular psoriasis, CRPT1 and AGS6) were more or less prevalent in both African and European populations (Fig 2 and Table 1). In contrast, mutations for the other eight disorders (CPHD2, RCD, MCPH1, PCH1B, Miller syndrome, FDLAB, GCCD4 and MEGDEL syndrome) were observed only in Europeans while they were not detected in other populations. There were no carriers for any of the 18 inherited disorders among the dataset from Asian populations.

Fig 2. Geographical minor allele frequency distribution for the causative mutations of representative three Mendelian disorders.

Pie areas are proportional to the minor allele frequency of the causative mutations for three inherited diseases (A: SCA, B: Pustular psoriasis, C: Miller syndrome). 1000G and NHLBI (2 + 14) populations are displayed separately. The thick white circle indicates the absence (0%) of mutations in the population. The right bar chart shows the mutation minor allele frequency in each population. A world map was obtained from Free Editable Worldmap ( and modified.

Estimated carrier rates correspond to those seen in clinical practice

SCA is an inherited blood-related disorder that affects hemoglobin and is characterized primarily by chronic anemia and periodic pain episodes [18, 19]. A mutation in the HBB gene, commonly called Hemoglobin S (HbS), causes SCA [18, 19]. SCA is common among persons whose ancestors descended from tropical regions, particularly Sub-Saharan Africa, South America, Saudi Arabia, India, and Mediterranean countries (e.g. Italy, Greece, and Turkey) [18, 19]. The CDC has reported that in the United States, SCA affects approximately 90,000–100,000 persons, most of whom have ancestors of African descent [20]. The disease occurs in about 1 in every 500 African-American births and 1 in every 36,000 [20] (or 1,000–1,400 [21]; the incidence rate is controversial) Hispanic-American births. However, highly accurate epidemiological studies based on clinical practice are still rare.

Table 2 shows, for 14 + 2 ethnic groups in the dataset, my estimates and literature estimates for carrier frequency for SCA. Predicted carrier rates were not statistically different from clinical geographic prevalence [2022] and Bayesian geostatistical map of HbS allele [23] (Table 2). There were no notable outliers, but I observed significantly higher carrier frequencies than expected for SCA in two populations (LWK (1000G, 9.79%) and PUR (1000G, 2.73%)) (P < 0.01) (Table 2). It is possible that the collected population was geographically distinct at these loci relative to prior studies. As expected, the HbS allele in African populations (NHLBI, 4.02%; 1000G, 9.15%) was detected at a significantly higher rate than in all European populations (NHLBI, 0.0233%; 1000G, 0%), Hispanics (1000G, 1.10%), and Asians (1000G, 0%) (P < 0.01) (Table 2).

Table 2. Comparison of predicted exome-based carrier rates with previous clinical estimates.

The P-value is calculated from Chi-square tests between two carrier estimates.

The prevalence rates of SCA in Hispanic Americans are controversial (1 in 36,000 [20] or 1 in 1,000–1,400 [21]), but the projected carrier rate here could support both data depending on the ancestral origin (Table 2). Taken together, exome-based estimates corresponded to those in the clinical prevalence survey and represented equivalent accuracy that may be achievable in clinical practice.

Screening priority for genetic testing

Current genetic testing is generally performed according to the ranking of carrier rates of the target mutations. Yet, precise data of targeted panel of genetic testing is not sufficient in clinical practice due to the large number of rare disorders. This tendency is particularly true for recently identified causative genes. Here, I demonstrated that the exome-based methods made it possible to identify a small number of high-priority nonsense and missense mutations linked to genetic disorders (Table 1). For example, the data suggests that, among six causative mutations for PCH1B, only one mutation (p.Asp132Ala,) should be high priority for EXOSC3 mutation screening in European populations, whereas other mutations are speculated to be quite rare (Table 1). The ranking of carrier rates of mutations was as follows: p.Asp132Ala (NHLBI EA, 0.128%) > p.Val80Phe (0.0116%) = c.475-1269A>G (0.0116%) > other mutations (0%). In the case of Miller syndrome, for which mutations have been reported in several papers, three mutations [p.Arg135Cys (NHLBI EA, 0.0607%), p.Gly152Arg (0.120%), and p.Arg346Try (0.161%)] should be given first priority for DHODH mutation screening in Europeans but not Africans. A different tendency was obtained for SCKL5: two mutations [p.Lys667Arg (NHLBI AA, 1.29%) and p.Tyr678* (0.176%)] occupied a central position in African populations (Table 1). These frequent mutations were detected in the 1000G dataset [p.Lys667Arg (1000G AFR, 1.22%) and p.Tyr678* (0.203%)]. Taken together, these data will allow the formulation of a suitable mutation panel that can be applied to determine the priority of genetic testing in clinical practice.

I further searched for undetected mutations using the Exome Aggregation Consortium (ExAC), which summarizes and categorizes exome data of 60,706 unrelated individuals from a variety of large-scale sequencing projects into six races (Table 3). The ExAC dataset detected additional 29 mutations although this data did not provide country-by-country genetic epidemiology of inherited disorders (Table 3). This result suggested that larger sample sizes and/or combinational use of a set of large exome sequencing projects could allow for more accurate prediction of carrier rates.

Table 3. Estimated carrier rates of 17 Mendelian disorders using ExAC data.

The carrier rates of Mendelian disorders were estimated using ExAC dataset. Child-hood cardiomyopathy (MIM no description) and Usher syndrome type 1J (USH1J) (#614869) were detected in ExAC but not in 1000G and NHLBI. ExAC populations are largely divided into six races: African, Latino, European (non-Finnish), European (Finnish), South Asian, East Asian, and Other.

Consistency of data between two different exome sequencing projects

I next examined the extent of differences in two exome-based carrier rates by comparing carrier rates in African and European ancestries between 1000G and NHLBI datasets. A pair-wise proportions test [24] was used, which was appropriate to test the null hypothesis stating that proportions in the two estimates were significantly different. This formula is referred to as a z-test because the statistic was as follows: (2) where pˆ = (p1 + p2)/(n1 + n2) and the indices (1, 2) refer to the first and second column of the table. A pair-wise proportion test between two exome resources showed no significant differences between the two different exome studies (46 cases; P >> 0.05), except in two African cases (P < 0.05) (S3 Table). This finding raises the possibility that exome-based predictions are divorced from sources of various arbitrary errors (e.g., diagnostic capacity) and may be an objective indicator.

Risk simulation and mutation detection rate of autosomal recessive disease

Finally simple deterministic formulae were introduced to predict the mutation detection rate of genetic risk using exome studies assuming a single-gene disease with an autosomal recessive inheritance pattern. The formula of the mutation detection rate (D) of Mendelian disorders was as follows: (3) where p refers to the mutation carrier rate in each population, and σ indicates the error rate of exome sequencing. N refers to the number of exomes available for epidemiological analysis. Fig 3 shows the simulation curve for the mutation detection rate. This prediction equation is applicable to general cases of predicting the incidence of inherited disorders. This predictive equation is responsive to parameters that affect carrier rate and data accuracy, and it is independent of the distribution of fitness effects. The epidemiological study was performed using a total of 7,595 samples from NHLBI and 1000G datasets, and a target mutation with carrier rate of 0.001 in this group could be theoretically detected with a probability of 99.95% under the condition of σ = 0.01. When the ExAC dataset was used under the same conditions, the probability of undetected rates was 7.70E-25%. Exome sequencing errors now are generally small (σ < 0.01) and thus have a small effect on mutation detection rates (S1 Fig).

Fig 3. Risk simulation and mutation detection rate of autosomal recessive disease.

The simulation graph depicts the theoretical mutation detection probability of high-penetrance genetic mutations (under the condition of σ = 0.01) that are associated with inherited disorders. The simulation sample sizes range from 1 to 100,000. The y-axis corresponds to the detection rate of causative mutations.


During the past several decades, biomedical research has identified the causative genes for almost >3,000 Mendelian disorders [14]. NGS results have provided empirical evidence that the genetic architecture of Mendelian disease is one of many rare causal mutations, although NGS have not yet identified all genetic mutations [24]. Despite the accumulation of significant genetic data, the epidemiology of Mendelian disorders remains unknown. The initial study here demonstrated the structured concept that genetic risk prediction using exome sequences accurately revealed carrier frequencies for rare Mendelian mutations with a small margin of error (Fig 2 and Table 2). The estimation algorithm was successfully applied to developing countries, and showed strong regional specificity of causative alleles (Fig 2). This study also set priorities aligning causative mutations with their carrier rates (Table 2). The accumulation of these data will make it possible to perform closely focused diagnostic genetic tests in specific countries and cities and to plan clinical services, assess priorities, and monitor prevalence trends. I have recently showed that exome-based epidemiology also could have the potential to provide a clue to understand the penetrance of each mutation [25].

A recent exome-based study [26], which focused on common diseases of interest, also successfully performed the risk prediction of target genetic disorders of newborn-screening, age-related macular degeneration (ARMD) and drug response across the two populations (American African and European). Their and my results suggested that NGS data could yield the useful information for applying genetic screening of genetic disorders in clinical practice.

Except Asian populations, the other populations have wider range of genetic variations, and the regional specificity is largest in African populations [14, 15, 27]. Therefore recent analysis [26] about a per-region breakdown of African allele frequency estimates possibly does not reflect the complex genetic structures in African populations. It is rational to analyze country-by-country and ethnicity-by-ethnicity epidemiology by using 1000G (Fig 2).

Data quality and limitations

The simulation studies here suggest that larger sample sizes or combination studies will allow for more accurate prediction of genetic risk (Fig 3). The ExAC data highlighted usefulness of large population size. Yet note that the present ExAC data also contains individuals sequenced as part of various disease-specific studies and does not reflect the complex genetic structures in African populations.

There were also some logistical issues that must be addressed when performing genetic epidemiological studies. The first limiting factor is consanguineous marriage [2830], which is irregular from the standpoint of population genetics. This practice largely influences the prevalence rate for autosomal recessive disorders [28, 29]. Most recent studies have used whole-exome sequencing of individuals from consanguineous families to identify rare coding variations in the rare pathogenesis [24], and some rare heritable disorders may never occur with outbreeding. Rates of consanguinity (e.g., marriage between cousins) vary greatly between and within countries and regions, but the prevalence is highest in North Africa, the Middle East, and South Asia and among migrant communities in North America, Europe, and Australia [29, 30]. At present, about 20% of the world’s population lives in communities with a preference for consanguineous marriages [29]. Public understanding regarding the genetic risk of consanguinity is still low in these countries [29, 30] The current accepted belief is that the consanguinity infrequently cause genetic disease, so it is important to provide evidence-based recommendations for genetic counseling and screening for consanguineous couples and not to provoke unnecessary alarm. The research here may promote the diffusion of overview on reproductive risks associated with consanguinity when the sample size are further extended. Intriguingly recent research also provides a fascinating view that the genomic inbreeding coefficient of each individual is an unexpected high to varying degrees even in 1000G data [31].

The second limiting factor is prenatal genetic counseling and testing. SCA, for which the U.S. Preventive Services Task Force (USPSTF) recommends screening [30], is a good example. Recent advances in prenatal genetic diagnosis make it easier than ever to gather more information on individuals prior to their birth [32, 33]. It is, therefore, crucial to consider the potential effect of abortion on the prevalence rates.

The third limiting factor is the mode of inheritance. The initial dataset in this study was originally derived from individuals with no cognitive impairment. Predicting risk has been successful for diseases that follow a simple mode of recessive inheritance, but risk prediction is challenging for autosomal dominant traits in this dataset. To analyze the autosomal dominant disorders, it is necessary to collect general population in specific area independent of their phenotypes.

The fourth limiting factors are the experimental limitations and uncertainties in identifying causative disease mutations. There is often the case where the causative disease-causing mutations are determined too easily without analyzing potential effect of mutations [34, 35] and the population exomes may not have read coverage over all of the causative loci. Some causative mutations may have been previously unreported and would occur de novo in the future as the past has already shown [3639]. In addition the degree of penetrance of the mutations remain largely unknown, and some reported disease mutations may be in fact not disease causing [40]. Therefore the carrier rates could be underestimated or overestimated. I suppose that discordance between carrier and prevalence rates of each mutation could provide a clue to understand the penetrance as well as screening priority.

Carrier rate in developing countries

One of the greatest merits of exome-based epidemiology is that we can easily conduct a part of public health surveillance of genetic disorders even in developing countries. According to the World Health Organization (WHO), congenital and inherited disorders increasingly contribute to perinatal morbidity and mortality in developing countries [41]. Despite this fact, many countries in Africa, South Asia, and South America still lack national policies and recommendations regarding screening for developmental abnormalities [12]. Genetic epidemiological studies have the potential to provide scientific evidence of genetic risks in most countries and disseminate public health advice. Given the lack of sampling depth in these countries, it seems that the ethnic groups who need the information and counseling the most, have the least sampling. The geographical portfolio of exome-based prediction could be expanded to more disorders and more countries. Furthermore, on this basis, key infrastructure requirements must be placed in sociopolitical frameworks, and medical resources must be allocated for institutions in both developed and developing countries.


Analysis of genetic mutations using two representative population exome projects

Genotyping pipelines from 1000G (Phase 1) ( and NHLBI ( projects were collected in VCF format. The dataset consisted of a total of 15,190 haploid exomes from high-coverage exome sequence data derived from 14 + 2 ethnic groups. NHLBI data contains individuals sequenced as part of various disease-specific studies and may not partially reflect the precise genetic population structures while 1000G collected healthy individuals. The validity of a part of the NHLBI dataset was previously assessed by NHLBI using Sanger sequencing [novel singleton variants, 143/145 (99%); novel nonsingleton variants 316/323 (98%)] [17]. The genotype accuracy of 1000G was estimated at 97.4% (20,687/21,235) by comparing with the HapMap genotype calls [15]. The 1000G and NHLBI datasets (VCF files) were filtered on Variant Tools ( and Microsoft Excel by total read depth, the number of individuals with coverage at the site, the fraction of mutation reads in each heterozygote, and the average position of mutation alleles along a read. Eighteen recessively inherited diseases were probatively retrieved and selected from literature (published from 1957 to 2014) and derived from NCBI OMIM ( and PubMed ( Causative mutations for inherited disorders were derived from these datasets based on the corresponding chromosome position (UTR, coding, intron, and splice site). ClinVar and HGMD were supplementarily reviewed to collect the mutations. Identified mutations were then classified by mutation type, allele frequency, racial groups, and clinical impact. Information on mutation types, positions, reference sequences, and pathogenicity were retrieved from NCBI dbSNP ( and UCSC genome browser ( to generate exome-based epidemiology. Statistical analysis, including carrier rate (%), was performed with Excel. ExAC Browser ( was additionally searched for the mutation alleles of 18 inherited disorders. A global map of carrier rate distribution was manually constructed for 15 recessive disorders collated from literature sources. A world map was obtained from Free Editable Worldmap ( and modified.

Pair-wise proportion tests of data consistency between two different exome resources

To project the performance of risk prediction based on analyses of exome sequence studies, I statistically compared exome-based estimates with the clinical prevalence survey. Evidence of data consistency was based on significant differences in pair-wise comparisons between populations if two estimates differed significantly (two-sample test for equality of proportions with continuity correction). The standard hypothesis test was H0: π1 = π2 against the alternative (two-sided) H1: π1 ≠ π2 The pair-wise prop test can be used to test the null hypothesis that the proportions (probabilities of success) in two groups are the same. In a two-way contingency table where H0: π1 = π2, this should yield comparable results to those of the ordinary χ2 test.

Mutation detection simulation of inherited diseases

To perform mutation detection simulation based on population exome sequences, a deterministic formulae () was calculated to predict the mutation detection rate of genetic risk using exome studies assuming a single-gene disease with an autosomal recessive inheritance pattern. The variable p refers to the mutation carrier rate in each population, and σ indicates the error rate of exome sequencing. N refers to the number of exomes available for epidemiological analysis. The simulation curve for the mutation detection rate is calculated and drawn using the R 3.13 statistical software ( together with the RColorBrewer package (

Supporting Information

S1 Fig. Risk simulation and mutation detection rate of autosomal recessive disease.

The theoretical mutation detection probability of high-penetrance genetic variants is calculated under the three condition (σ = 0; 0.01; 0.1) although the simulation under σ = 0.1 is unlikely situation. The simulation sample sizes range from 1 to 100,000. The y-axis corresponds to the detection rate of causative mutations.


S1 Table. Population disposition (ethnicity and male/female ratio).


S2 Table. Lists of genetic disorders and their causative genes in this study.

Target causative mutation lists analyzed in this study and representative reference lists.


S3 Table. Comparison of carrier rates between two different exomes (1000 Genomes and NHLBI).

The P-value is calculated from pair-wise proportion tests of allele frequencies in European and African ancestries between the two different exome resources (1000 Genomes vs NHLBI).



The author would like to thank the 1000 Genomes Project, the NHLBI GO Exome Sequencing Project and Exome Aggregator database for the available datasets.

Author Contributions

Conceived and designed the experiments: KF. Performed the experiments: KF. Analyzed the data: KF. Wrote the paper: KF.


  1. 1. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011; 12: 745–755. pmid:21946919
  2. 2. Ku CS, Naidoo N, Pawitan Y. Revisiting Mendelian disorders through exome sequencing. Hum Genet. 2011; 129: 351–370. pmid:21331778
  3. 3. Ku CS, Cooper DN, Polychronakos C, Naidoo N, Wu M, Soong R. Exome sequencing: dual role as a discovery and diagnostic tool. Ann Neurol. 2012; 71: 5–14. pmid:22275248
  4. 4. Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, et al. Exome sequencing and the genetic basis of complex traits. Nat Genet. 2012; 44: 623–630. pmid:22641211
  5. 5. Docherty S, Iles R. Biomedical Sciences: Essential Laboratory Medicine. 2012. pp. 116–117.
  6. 6. Katsanis SH, Katsanis N. Molecular genetic testing and the future of clinical genomics. Nat Rev Genet. 2013; 14: 415–426. pmid:23681062
  7. 7. de la Paz MP, Villaverde-Hueso A, Alonso V, János S, Zurriaga O, Pollán M, et al. Rare diseases epidemiology research. Adv Exp Med Biol. 2010; 686: 17–39. pmid:20824437
  8. 8. Scriver CR, Beaudet AL, Sly WS, Valle D. The Metabolic and Molecular Bases of Inherited Disease. 1995.
  9. 9. Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, Crawford DC, et al. The Next PAGE in understanding complex traits: design for the analysis of Population Architecture Using Genetics and Epidemiology PAGE. Study. Am J Epidemiol. 2011; 174: 849–859. pmid:21836165
  10. 10. Lazarin GA, Haque IS, Nazareth S, Iori K, Patterson AS, Jacobson JL, et al. An empirical estimate of carrier frequencies for 400+ causal Mendelian variants: results from an ethnically diverse clinical sample of 23,453 individuals. Genet Med. 2013; 15: 178–186. pmid:22975760
  11. 11. Krickeberg K, Kar A, Chakraborty AK. Handbook of Epidemiology Epidemiology in Developing Countries. 2005. pp. 1545–1589.
  12. 12. Becker F, van El CG, Ibarreta D, Zika E, Hogarth S, Borry P, et al. Genetic testing and common disorders in a public health framework: how to assess relevance and possibilities. Background Document to the ESHG recommendations on genetic testing and common disorders. Eur J Hum Genet. 2011; 19: S6–44.
  13. 13. Bornstein MH, Hendricks C. Screening for developmental disabilities in developing countries. Soc Sci Med. 2013; 97: 307–315. pmid:23294875
  14. 14. Altshuler D, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG. et al. A map of human genome variation from population-scale sequencing. Nature. 2010; 467: 1061–1073. pmid:20981092
  15. 15. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491: 56–65. pmid:23128226
  16. 16. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science. 2012; 337: 64–69. pmid:22604720
  17. 17. Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013; 493: 216–220. pmid:23201682
  18. 18. Ingram VM. Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature. 1957; 180: 326–328. pmid:13464827
  19. 19. Akinsheye I, Alsultan A, Solovieff N, Ngo D, Baldwin CT, Sebastiani P, et al. Fetal hemoglobin in sickle cell anemia. Blood. 2011; 118: 19–27. pmid:21490337
  20. 20. National Center for Disease Control Available:
  21. 21. Morton DA. Medical Issues in Social Security Disability 1, section 7.0.5. 2013.
  22. 22. Modell B, Darlison M. Global epidemiology of haemoglobin disorders and derived service indicators. Bull World Health Organ. 2008; 86: 480–487. pmid:18568278
  23. 23. Piel FB, Patil AP, Howes RE, Nyangiri OA, Gething PW, Dewi M, et al. Global epidemiology of sickle haemoglobin in neonates: a contemporary geostatistical model-based map and population estimates. Lancet. 2013; 381: 142–151. pmid:23103089
  24. 24. Lee W. Testing the genetic relation between two individuals using a panel of frequency-unknown single nucleotide polymorphisms. Ann Hum Genet. 2003; 67: 618–619. pmid:14641250
  25. 25. Fujikura K. Global epidemiology of Familial Mediterranean fever mutations using population exome sequences. Mol Genet Genomic Med. 2015; 3: 272–282. pmid:26247045
  26. 26. Tabor HK, Auer PL, Jamal SM, Chong JX, Yu JH, Gordon AS, et al. Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: implications for the return of incidental results. Am J Hum Genet. 2014; 95: 183–193. pmid:25087612
  27. 27. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, et al. The African Genome Variation Project shapes medical genetics in Africa. Nature. 2015; 51: 327–332.
  28. 28. Mdell B, Darr A. Science and society: genetic counselling and customary consanguineous marriage. Nat Rev Genet. 2002; 3: 225–229. pmid:11972160
  29. 29. Kisioglu AN, Ormeci AR, Uskun E, Ozturk M, Ongel K. Effects of a formal training programme on consanguineous marriages on high school students’ knowledge and attitudes: an interventional study from Turkey. J Biosoc Sci. 2010; 42: 161–176. pmid:19922700
  30. 30. Jordan L, Swerdlow P, Coates TD. Systematic review of transition from adolescent to adult care in patients with sickle cell disease. J Pediatr Hematol Oncol. 2013; 35: 165–169. pmid:23511487
  31. 31. Gazal S, Sahbatou M, Babron MC, Génin E, Leutenegger AL. High level of inbreeding in final phase of 1000 Genomes Project. Sci Rep. 2015; 5: 17453. pmid:26625947
  32. 32. Stern HJ. Preimplantation Genetic Diagnosis: Prenatal Testing for Embryos Finally Achieving Its Potential. J Clin Med. 2014; 3: 280–309. pmid:26237262
  33. 33. Tabor HK, Murray JC, Gammill HS, Kitzman JO, Snyder MW, Ventura M, et al. Non-invasive fetal genome sequencing: opportunities and challenges. Am J Med Genet. A. 2012; 158A: 2382–2384. pmid:22887792
  34. 34. Siemiatkowska AM, Schuurs-Hoeijmakers JH, Bosch DG, Boonstra FN, Riemslag FC, Ruiter M, et al. Nonpenetrance of the most frequent autosomal recessive leber congenital amaurosis mutation in NMNAT1. JAMA Ophthalmol. 132, 1002–1004 (2014). pmid:24830548
  35. 35. van Rheenen W, Diekstra FP, van den Berg LH, Veldink JH. Are CHCHD10 mutations indeed associated with familial amyotrophic lateral sclerosis? Brain. 2014; 137: e313. pmid:25348631
  36. 36. Hunter JM, Ahearn ME, Balak CD, Liang WS, Kurdoglu A, Corneveaux JJ, et al. Novel pathogenic variants and genes for myopathies identified by whole exome sequencing. Mol Genet Genomic Med. 2015; 3: 283–301. pmid:26247046
  37. 37. Steinberg KM, Yu B, Koboldt DC, Mardis ER, Pamphlett R. Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS. Sci Rep. 2015; 5: 9124. pmid:25773295
  38. 38. Casey JP, Støve SI, McGorrian C, Galvin J, Blenski M, Dunne A, et al. NAA10 mutation causing a novel intellectual disability syndrome with Long QT due to N-terminal acetyltransferase impairment. Sci Rep. 2015; 5: 16022. pmid:26522270
  39. 39. Joshi R, Shvartsman M, Morán E, Lois S, Aranda J, Barqué A, et al. Functional consequences of transferrin receptor-2 mutations causing hereditary hemochromatosis type 3. Mol Genet Genomic Med. 2015; 3: 221–232. pmid:26029709
  40. 40. Lek M, Karczewski K, Minikel E, Samocha K, Banks E, et al. Analysis of protein-coding genetic variation in 60,706 humans. bioRxiv. 2015.
  41. 41. WHO. Screening the genes. Available: