Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identification of potentially pathogenic variants for autism spectrum disorders using gene-burden analysis

  • Nika Rihar,

    Roles Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing – original draft

    Affiliation Biotechnical Faculty, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia

  • Danijela Krgovic,

    Roles Data curation, Methodology, Writing – review & editing

    Affiliations Laboratory of Medical Genetics, University Medical Centre Maribor, Maribor, Slovenia, Maribor Medical Faculty, University of Maribor, Maribor, Slovenia

  • Nadja Kokalj-Vokač,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliations Laboratory of Medical Genetics, University Medical Centre Maribor, Maribor, Slovenia, Maribor Medical Faculty, University of Maribor, Maribor, Slovenia

  • Spela Stangler-Herodez,

    Roles Investigation, Methodology, Resources

    Affiliations Laboratory of Medical Genetics, University Medical Centre Maribor, Maribor, Slovenia, Maribor Medical Faculty, University of Maribor, Maribor, Slovenia

  • Minja Zorc,

    Roles Conceptualization, Data curation, Investigation, Methodology, Software, Supervision, Validation, Writing – review & editing

    Affiliation Biotechnical Faculty, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia

  • Peter Dovc

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    peter.dovc@bf.uni-lj.si

    Affiliation Biotechnical Faculty, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia

Abstract

Gene- burden analyses have lately become a very successful way for the identification of genes carrying risk variants underlying the analysed disease. This approach is also suitable for complex disorders like autism spectrum disorder (ASD). The gene-burden analysis using Testing Rare Variants with Public Data (TRAPD) software was conducted on whole exome sequencing data of Slovenian patients with ASD to determine potentially novel disease risk variants in known ASD-associated genes as well as in others. To choose the right control group for testing, principal component analysis based on the 1000 Genomes and ASD cohort samples was conducted. The subsequent protein structure and ligand binding analysis usingI-TASSER package were performed to detect changes in protein structure and ligand binding to determine a potential pathogenic consequence of observed mutation. The obtained results demonstrate an association of two variants–p.Glu198Lys (PPP2R5D:c.592G>A) and p.Arg253Gln (PPP2R5D:c.758G>A) with the ASD. Substitution p.Glu198Lys (PPP2R5D:c.592G>A) is a variant, previously described as pathogenic in association with ASD combined with intellectual disability, whereas p.Arg253Gln (PPP2R5D:c.758G>A) has not been described as an ASD-associated pathogenic variant yet. The results indicate that the filtering process was suitable and could be used in the future for detection of novel pathogenic variants when analysing groups of ASD patients.

Introduction

Autism spectrum disorders (ASDs) are classified as pervasive developmental disorders. The diagnosis of ASDs is based on criteria in two areas: a deficit in social communication and interaction, and the presence of limited, repetitive patterns of behaviour, interests, and activities. Symptoms should be present early in development but may become apparent also later when social demands exceed the patient’s limited abilities [1]. ASDs are divided into non-syndromic and syndromic forms. In the former, ASD is considered the main diagnosis, while in the latter it is only one part of a complex disorder that may also include other developmental abnormalities [2]. In this complex disorder, both, genetics and environment play an important role, as well as the interaction between them [3]. The prevalence of ASD is estimated to be 1–2 per 100 [4]. Since recent studies estimate the heritability of the disorder to be 0.84–0.90, many researchers have focused on performing genomic analyses of individuals with ASD [57]. Their main goal is to discover genes and biological pathways associated with the disorder that should explain the pathogenesis, allow for better diagnosis, and enable the use of potential new agents in treatment [8,9]. However, gene and pathway discovery is a major challenge because patients diagnosed with ASD have very heterogeneous phenotypes and causal loci that contribute to their phenotype [10]. Genes that have been associated with ASD, play important roles in chromatin remodelling [1114], protein synthesis and ubiquitination [13], and synapse [13,14] and neuron [14] development and function. Variants in these ASD-associated genes are both, common and rare [11,13,15,16]. It is suggested that much of the risk for the disorder can be attributed to the rare variants with strong effects [17]. Rare copy number variations (CNVs) play an important role in susceptibility to the disorder. The most important CNV associated with ASD is the 16p11.2 deletion, which has also been associated with other psychiatric disorders [18,19]. In addition to CNVs, rare point variants also increase the risk for developing the disorder [20,21].

The study of these rare variants in exome sequencing data using gene-burden analyses has recently become a very attractive and effective way to identify disease associated genes [22]. These gene-burden analyses have also been useful for the discovery of genes in complex disorders such as ASD, which have high heterogeneity of causal genes [23]. The basic concept of gene-burden analysis is to compare the number of likely pathogenic variants in the selected cases versus controls in each of the genes studied [24].

The most challenging task in the process of analysis is the detection and selection of likely disease-causing variants, which are called qualifying variants. The so-called qualifying variants are then further examined using appropriate filters, such as various quality metrics, allele frequency, and predicted consequence and pathogenicity using different tools [25]. Gene-burden analysis can be performed using local, in-house software, but it is also possible to use available software from other sources. Software packages that allow gene-burden testing using public databases (ExAC, GnomAD [26]) have proven to be a useful tool for association of disease-associated genes in cases, where no control cohort is available. For example, Testing Rare Variants using Public Data (TRAPD) allows custom selection of all possible minor allele frequencies (MAF), subcategories of populations (e.g. non-Finnish Europeans), filtering of variants based on annotations (PolyPhen [27], SIFT [28], Mutation Taster [29]) and selection of any control population [30]. TRAPD has been used to successfully discover disease-associated genes associated with various diseases such as Diamond-Blackfan anemia, human spiradenoma, spiradenocarcinoma, Meniere’s disease, and amyotrophic lateral sclerosis [3135].

Burden testing for ASD has been done a few times [3638]. In smaller studies, the number of individuals tested has been a limitation to the discovery of ASD-associated genes [36,37]. To increase the possibility of discovering new associations, some studies used larger test cohorts [38]. Stricter criteria for variant and gene selection (only truncating variants in a haploinsufficient gene that has no truncating variation in the healthy control population) also proved to be a successful way to discover associations [38]. The results show that due to the highly polygenic nature of the disease, large test cohorts and pre-selection of genes suitable for testing are necessary [3638].

Considering all the advantages and disadvantages of the above-mentioned studies, we believe that by using large control cohorts from large databases, such as the Genome Aggregation Database (GnomAD), reducing phenotypic heterogeneity by including only patients with similar phenotypes, and selecting likely pathogenic variants and tested genes according to more stringent criteria, the association of genes with pathogenic variants underlying ASD could be discovered. The main objective of our study was to analyse whole-exome sequencing data from 35 Slovenian patients diagnosed with ASD in combination with intellectual disability (ID) followed by the application of gene-burden testing using TRAPD software and a control cohort from the GnomAD database. In addition, our study aims to discover novel ASD-associated probably pathogenic variants and thus contribute to clinical knowledge on variant pathogenicity using a combination of allele frequencies in the population, pathogenicity prediction tools, o/e-, and Z scores for variant and gene selection for burden testing. The aim of this study was also to identify genes underlying the disease in the Slovenian population.

Materials and methods

Autism spectrum disorders cohort

The Laboratory of medical genetics at the University Medical Centre Maribor, Slovenia, collected DNA from 99 Slovenian subjects diagnosed with ASD who had been referred to the laboratory as part of a standard diagnostic procedure. This experiment was approved by the University Medical Centre of Maribor and the Ethics committee. Written informed consent to participate in this study was provided by all participants or their legal guardians. Their DNA was obtained from whole blood samples.

Whole exome sequencing

Whole exome sequencing analysis of 99 samples was performed by Novogene (China). The starting material for sample preparation was 1.0 μg genomic DNA per sample. Sequencing libraries were generated using Agilent SureSelect Human All ExonV6 kit (Agilent Technologies, CA, USA) according to the manufacturer recommendations. Index codes were added to distinguish samples after pooling. The main steps of library preparation were performed as follows. First, DNA was sheared into 180–280 bp fragments using hydrodynamic shearing system (Covaris, Massachusetts, USA). The remaining overhangs were converted to blunt ends by exonuclease/polymerase activity. This step was followed by the adenylation of the 3’ ends of the DNA fragments and ligation of adapter oligonucleotides. Only the fragments that were ligated with adapters at both ends were then enriched by polymerase chain reaction. Index tags were then added, followed by hybridization of the samples. After hybridization, the products were purified using the AMPure XP system (Beckman Coulter, Beverly, USA). Quantification before pooling, according to the sample concentration was performed using the Agilent High Sensitivity DNA Assay on the Agilent Bioanalyzer 2100 system. Finally, the pooled libraries were loaded onto Illumina HiSeq 2500 sequencer.

Variant calling and annotation

The generated sequencing reads (fastq file format) were aligned to the human reference genome build b37 using BWA software version 0.7.10 [39]. Subsequent bioinformatic analyses of the binary bam files were performed using Genome Analysis Toolkit toolset version 3.5 according to best practice recommendations for variant calling, such as marking duplicates, quality score recalibration of bam files, and variant calling with HaplotypeCaller [40]. Subsequently, the variant call format (VCF) files of all patients were combined into one VCF file and then analysed using VariantRecalibrator for variant quality recalibration, which assigns a probability score for a true genetic variant to each variant in a VCF file. Finally, the variant call file was annotated using Variant Effect Predictor v98 (Fig 1) [41].

thumbnail
Fig 1. Flow chart for variant detection and prioritizing for gene-burden testing.

https://doi.org/10.1371/journal.pone.0273957.g001

Principal component analysis

Principal Component Analysis (PCA) of common variants was performed on exome regions of 1000 Genomes Phase 3 and ASD cohort data. SNPs were pruned using PLINK 1.9, based on linkage disequilibrium [42]. The variance inflation threshold of 1.5 was applied, and variants with minor allele frequencies greater than 10% were retained in the analysis. R v3.5.2 software was used for graphical representation [43]. The ancestry of the ASD cohort was predicted using principal component coordinates.

Reduction of phenotypic heterogeneity

To reduce phenotype heterogeneity, only unrelated patients with the ASD phenotype in combination with ID from the ASD cohort were included in the following gene load analysis. This group of 35 patients was selected based on the fact that ID is most influenced by genetics and the least by environment of all neurodevelopmental disorders [44]. In addition, the underlying genetic cause of the disorder is more frequently discovered in patients with a combined ID diagnosis [45].

Gene-burden analysis

Gene-burden testing between case (35 unrelated case subjects with a diagnosis of ASD combined with ID from the ASD cohort) and control subjects was performed using TRAPD software. Because the case subjects belong to the group of Europeans of non-Finnish descent and have neurological phenotypes, the non-Finnish non-neurological group of control subjects from GnomAD exomes v2.1 was used for testing. GnomAD exomes v2.1 is an exome sequencing aggregation database of 125,748 individuals, not known to have a severe Mendelian disease. Variants with a MAF of less than 0.001% in the GnomAD database or that were not present in the GnomAD database were considered rare and included in the analysis. In order to also consider the variants that are common in the Slovenian population and not present in the GnomAD exome database, the variants with an allele count of less than 3 in the ASD cohort were considered rare in Slovenian population and were also considered in the further analysis. Considering the already published results we decided to focus only on very rare variants [25,4650]. In the process of burden testing, rare synonymous variants were first analysed to adjust the sequencing quality thresholds and to ensure that there was no significant enrichment of genes. To ensure uniform sequencing coverage of the analysed variants and thus avoid biased results, only variants at sites where the sequencing depth was greater than 10x in more than 90% of the samples were retained. Next, among the possible quality scores quality by depth (QD), Phred quality (QUAL), genotype Phred quality (GQ), mapping quality (MQ), and variant quality score log-odds (VQSLOD) scores [22,24,30], were selected for variant filtering. Only variants marked as “PASS” based on the VQSLOD were further analysed. After establishing appropriate quality thresholds, a burden analysis was performed for rare stop-gains, frameshifts, splice acceptors, splice donors, and missense which were classified as ‘probably damaging’ by PolyPhen-2, ‘deleterious’ by SIFT, the likelihood ratio test (LRT) and RadialSVM, ‘disease-causing’ by MutationTaster and ‘high’ by MutationAssessor [29,5153]. In gene-burden analyses, variants that have a greater impact on proteins (frameshift, stop-gain, splicing), and occasionally those that have a lesser impact on them (missense variants) are usually analysed [30,35,48]. The decision to focus on both missense and loss-of-function variants (frameshift, stop-gain, splicing) was based on the assumption that both categories can be pathogenic. To analyse only the more deleterious missense variants, the results of pathogenicity prediction tools (Polyphen-2, LRT, MutationTaster, MutationAssessor, RadialSVM and SIFT) were considered. Prior to analysis, all selected variants were also filtered based on the observed/expected (o/e) score and the Z score of a gene. This allowed us to adequately test scores for gene intolerance to variation based on analyses of large healthy control populations such as the GnomAD. The o/e ratio is the ratio between the number of observed variants and the number of expected variants with loss of function. It is strongly recommended that haploinsufficient genes should be selected using the upper limit of the confidence interval of less than 0.35 [26]. Another value, developed by researchers is the Z score, which indicates the tolerance of a gene to missense variation, with higher Z scores indicating lower tolerance to this category of variation [54]. Considering the above recommendations, loss-of-function variants were retained in genes with an upper limit of the o/e confidence interval of less than 0.35 and missense variants were retained in haploinsufficient genes also in the top 5% when the Z score was considered (Fig 1). The o/e and Z scores, used were taken from the GnomAD table ‘pLoF Metrics by Gene’ (https://gnomad.broadinstitute.org/downloads).

Analysis of the target region.

To validate the target region by Sanger sequencing, two primer pairs were designed for polymerase chain reaction (PCR). To validate the p.Arg253Gln mutation, primers 5’-AGAGGAAGATGAGCCCACCC-3’ and 5’-CTGCCTACGGATATAAGCCCG-3’ were used to amplify a 785 bp long product. The region containing the p.Glu198Lys mutation was amplified with primers 5’-CACCCATAGCCGTGATGTTGT-3’ and 5’-GAGCAAGTACAAACTTCTGGTCG-3’, resulting in a 452 bp long product. A 25 μl PCR reaction contained 17 μl nuclease-free water, 5 μl 5X One Taq Standard Reaction Buffer (NEB), 5 pmol of each primer, 0.2 mmol/l deoxynucleoside triphosphates (dNTPs) (Thermo Scientific), 0.6 U OneTaq DNA Polymerase (NEB), and 50 ng DNA. Reaction conditions were as follows: 94°C 30s; 30 cycles 94°C 30s, 61°C 30s, 68°C 1 min; 68°C 5 min. PCR products were then sequenced on the Applied Biosystems 3500 genetic analyser, using primers 5’-CATCACTTTGGAAGTCTCAGTACAA-3’ and 5’-CACCCATAGCCGTGATGTTGT-3’ for validation of the p.Arg253Gln and p.Glu198Lys variants, respectively.

The p.Arg253Gln mutation was also validated by restriction fragment length polymorphism (RFLP) analysis. For PCR, primers 5’-TGAGGCATCACTTTGGAAGTCT-3’ and 5’-GCTCTCTTGACAACCCCTGA-3’ were used to amplify a 442 bp long product. The reaction mixture was the same as described previously and the conditions were as follows: 94°C 30s; 30 cycles 94°C 30s, 59°C 30s, 68°C 1 min; 68°C 5 min. Then, nested PCR was performed with the PCR product using primers 5’-CAGGGGACCTCTGCATTTC-3’ and 5’-GCTCTCTTGACAACCCCTGA-3’ under the following conditions: 94°C 30s; 30 cycles 94°C 30s, 58°C 30s, 68°C 1 min; 68°C 5 min. A 25 μl reaction contained 1 μl of a 10−1 dilution of PCR product, 17 μl nuclease-free water, 5 μl 5X One Taq Standard Reaction Buffer, 5 pmol of each primer, 0,2 mmol/l dNTPs, and 0,6 U OneTaq DNA Polymerase. This yielded a 351 bp long product, that was digested with XhoI. The 7 μl reaction contained 4 μl H2O, 2 μl of the amplification product, 0.7 μl 10X NEB buffer and 10 U enzyme and was incubated at 37°C for 3 h. The digestion products were separated on a 2% agarose gel and stained with ethidium bromide.

Prediction of protein structure and function

Software I-TASSER was used to predict the effects of the p.Arg253Gln mutation on protein structure and function. The original and modified amino-acid sequences of PPP2R5D were analysed on the online server I-TASSER [55,56].

Results

Genetic ancestry of Slovenian ASD patients

The individuals from the 1000 Genomes Project and the Slovenian ASD patients were plotted based on the first two principal components. The data of all individuals are coloured according to their superpopulation (AFR—African, AMR—Admixed American, EAS–East Asian, EUR—European, SAS–South Asian), except for the Finnish (FIN) and Slovenian (SI) ASD patients who are coloured in their own colours. The Slovenian patients are positioned at the position of the Europeans, while the Finnish population is positioned next to the sample data of the Europeans. Considering these principal component coordinates, the cohort of ASD patients belongs to the non-Finnish European population (Fig 2).

thumbnail
Fig 2. PCA based on the 1000 Genomes and ASD cohort samples.

https://doi.org/10.1371/journal.pone.0273957.g002

Significant genes in autism spectrum disorders

The results of the rare synonymous burden analysis showed no significant enrichment of genes, enriched in the following analysis of rare, probably pathogenic variants. The most strongly associated gene was NOL4, which, however, contained fewer variants than expected. Some of the genes have no variants in the population of ASD patients, representing a narrow line along the x-axis (Fig 3). Since the number of genes in the genome is about 19000, the significance exome wide threshold α is after correction for multiple testing (0.05/19000) set to 2.6 × 10−6 [30,35]. Because the number of genes included in this burden analysis was 2735, the experiment wide threshold α was set at 1.8 × 10−5 (0.05/2735). After applying the burden testing of pathogenic variants on these genes, an experiment wide threshold association of PPP2R5D gene (p = 1.7 × 10−5) was identified. Other genes were less significantly associated. Many genes in the group of ASD patients did not have potentially pathogenic variants, as shown by the dots forming a horizontal line along to the x-axis (Fig 4). Two patients in our cohort were found to have a risk missense mutation in this gene. One of these patients had a mutation p.Arg253Gln and the other one a mutation p.Glu198Lys. One of the patients was diagnosed with ID and suspected ASD, while the other was diagnosed with autism and ID (Table 1).

thumbnail
Fig 3. Quantile-quantile plot showing the–log10 of p values of the burden testing considering only synonymous, rare variants versus the expected–log10 of p values.

Values are given for genes (black dots). The black line represents the ratio between expected and observed p values when the distribution of p values is uniform, and the blue, dotted line represents the actual ratio when genes are considered that fall in the range between the 50th and 90th percentile. The most significantly associated gene is NOL4.

https://doi.org/10.1371/journal.pone.0273957.g003

thumbnail
Fig 4. The results of burden testing for rare, probable pathogenic mutations.

PPP2R5D is the most significantly associated gene, and has more mutations than expected.

https://doi.org/10.1371/journal.pone.0273957.g004

thumbnail
Table 1. Phenotypes of individuals with pathogenic variants in PPP2R5D.

https://doi.org/10.1371/journal.pone.0273957.t001

Analysis of the target region

Both variants, p.Arg253Gln (Fig 5) and p.Glu198Lys (Fig 6) were confirmed by Sanger sequencing. The p.Arg253Gln variant was also confirmed by RFLP. The restriction enzyme XhoI cut the PCR product without mutation into two fragments, whereas it did not cut the product with mutation. The heterozygous mutation was identified by the presence of an undigested, 351 bp long fragment, and the digested 220 bp and 131 bp long fragments (Fig 7).

thumbnail
Fig 5. Sanger sequencing results for variant p.Arg253Gln.

On the left side is the sequence of a person without mutation and on the right side is the sequence of the patient heterozygous for p.Arg253Gln mutation.

https://doi.org/10.1371/journal.pone.0273957.g005

thumbnail
Fig 6. Sanger sequencing results for variant p.Glu198Lys.

On the left side is the sequence of a person without mutation and on the right side is the sequence of the patient heterozygous for p.Glu198Lys mutation.

https://doi.org/10.1371/journal.pone.0273957.g006

thumbnail
Fig 7. Results of PCR-RFLP analysis of p.Arg253Gln mutation.

Lane 1: 100 bp DNA ladder, lane 2: A person without mutation, lane 3: The patient, heterozygous for p.Arg253Gln mutation.

https://doi.org/10.1371/journal.pone.0273957.g007

Prediction of ligand binding sites on PPP2R5D

The results of predicting the consequences of the p.Arg253Gln mutation on protein structure and function show that the possible ligand, Importin subunit beta-1, that binds to the original protein (Fig 8A), is not a plausible ligand for the mutant protein (Fig 8B). These I-TASSER results of protein-ligand binding sites are based on structure comparison, protein-protein networks, and detection of ligand binding templates.

thumbnail
Fig 8.

Results of ligand binding site prediction for the original (A) and mutant (B) protein sequences. Software I-TASSER suggests the ligand most likely to bind to the analysed protein. The binding protein residues are shown in blue, and the predicted binding ligands are shown in green-yellow.

https://doi.org/10.1371/journal.pone.0273957.g008

Discussion

Analysis of rare, probably pathogenic variants, revealed an association with two risk variants. Both variants are located in the PPP2R5D gene, which plays an important role in the regulation of neuronal and developmental processes. The product of the gene is part of the enzyme phosphatase 2A, specifically the regulatory subunit B. It may regulate the catalytic activity and substrate selectivity of the enzyme [57]. Moreover, missense variants in this gene have already been associated with ASD with ID and other comorbid symptoms [5860]. Since the carriers of the pathogenic variants in the PPP2R5D gene in the Slovenian cohort have similar phenotypes to those mentioned above, namely autism in combination with ID (Table 1), this supports the association of the discovered variants with their phenotypes.

Of the two variants discovered, PPP2R5D(NM_006245.4):c.758G>A (p.Arg253Gln) has not been described in association with ASD yet. This variant is classified as a variant of unknown significance (VOUS) because it meets PM2, PP2, and PP3 criteria of the ACMG guidelines [61]. The results of this case-control association study (Fig 4), and the in silico protein structure and function analysis (Fig 8) support a pathogenic effect of the variant. It is very rare in the GnomAD database and the amino acid substitution is semi-conservative, which, according to the I-TASSER analysis results, affects the protein function. More specifically, these results of protein structure and function prediction suggest that the mutation impacts the ligand binding function of the protein (Fig 8). The phenotype of the patient, carrying this variant, also matches the phenotypes of patients, carrying variants in the PPP2R5D gene. Considering all these facts, this variant might have an impact on the phenotype of autism and ID. Because the in silico prediction supports criteria PP3 of the ACMG guidelines, we propose to classify the PPP2R5D(NM_006245.4):c.758G>A (p.Arg253Gln) variant as likely pathogenic.

The other variant, PPP2R5D(NM_006245.4):c.592G>A (p.Glu198Lys), has already been described in the context of ID and autism [59]. It is classified as pathogenic because it meets PM1, PM2, PP2, PP3, and PP5 criteria of the ACMG guidelines [61]. The consequence of the variant is a non-conservative amino acid substitution that could affect the secondary structure of the protein. Functional studies revealed deficient formation of the protein in cells expressing this mutation [62]. Because the present study detected this pathogenic variant in association with ID and the autism phenotype, this result indicates that the presented approach is suitable to detect the probably pathogenic variants.

The combination of case-control analysis using large public databases, similar phenotypes, and the use of o/e and Z scores in this gene-burden analysis led to the discovery of probably pathogenic variants in an ASD-associated gene. These results support the usefulness of o/e and Z scores in the interpretation of patient’s variants. The results obtained also confirm that gene-burden analysis combined with appropriate filtering of data based on population frequencies, predictions of pathogenicity prediction tools, o/e, and Z scores, is a suitable tool to detect the probably pathogenic variants in genes, associated with ASD.

Conclusions

The results of this gene-burden study show an association of two risk variants with ASD, one of which has not previously been described in association with this disorder. The results of examination of significant new variants using protein structure and function prediction software also support the pathogenic effect of the variant. This suggests that the proposed filtering and pathogenicity prediction method is suitable for detecting rare deleterious variants in sequences from ASD patients. It provides a new opportunity for clinicians to discover additional, disease-causing variants not previously described and to screen out those, already defined as pathogenic.

However, gene-burden studies using publicly available control data have several limitations. The main limitations are the number of patients included in a study and the lack of a healthy control group from the same population. To ensure better power of the analysis, researchers conducting these analyses can collect data from more patients, as well as some healthy controls. Collecting data from 100 more patients could improve the reliability of the results of the gene-burden analysis. Including a healthy control group from the studied population could provide more accurate filtering of pathogenic variants based on frequency. Regardless of whether the pathogenicity of the new variants is supported by the structure and function prediction tool results, in vitro analysis of the effects of the new variants could provide additional insight into changes at protein level.

Acknowledgments

The authors thank Marta Macedoni-Luksic, for referral of patients, and the parents of the patients for their cooperation. The authors also thank Branko Aleksic, Jernej Ogorevc, and Ajda Lenardic for fruitful discussions and Tine Pokorn for technical assistance.

References

  1. 1. Association AP. Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub; 2013.
  2. 2. Yin J, Schaaf CP. Autism genetics–an overview. Prenatal diagnosis. 2017;37(1):14–30. pmid:27743394
  3. 3. Chaste P, Leboyer M. Autism risk factors: genes, environment, and gene-environment interactions. Dialogues in clinical neuroscience. 2012;14(3):281. pmid:23226953
  4. 4. Baio J, Wiggins L, Christensen DL, Maenner MJ, Daniels J, Warren Z, et al. Prevalence of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, United States, 2014. MMWR Surveillance Summaries. 2018;67(6):1.
  5. 5. Geschwind DH, Flint J. Genetics and genomics of psychiatric disease. Science. 2015;349(6255):1489–94. pmid:26404826
  6. 6. Sandin S, Lichtenstein P, Kuja-Halkola R, Hultman C, Larsson H, Reichenberg A. The heritability of autism spectrum disorder. Jama. 2017;318(12):1182–4. pmid:28973605
  7. 7. Yip BHK, Bai D, Mahjani B, Klei L, Pawitan Y, Hultman CM, et al. Heritable variation, with little or no maternal effect, accounts for recurrence risk to autism spectrum disorder in Sweden. Biological psychiatry. 2018;83(7):589–97. pmid:29100626
  8. 8. Vorstman JA, Parr JR, Moreno-De-Luca D, Anney RJ, Nurnberger JI Jr, Hallmayer JF. Autism genetics: opportunities and challenges for clinical translation. Nature Reviews Genetics. 2017;18(6):362. pmid:28260791
  9. 9. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identification strategies for exome sequencing. European Journal of Human Genetics. 2012;20(5):490–7. pmid:22258526
  10. 10. Bill BR, Geschwind DH. Genetic advances in autism: heterogeneity and convergence on shared pathways. Current opinion in genetics & development. 2009;19(3):271–8. pmid:19477629
  11. 11. Iossifov I, O’roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515(7526):216–21. pmid:25363768
  12. 12. Cotney J, Muhle RA, Sanders SJ, Liu L, Willsey AJ, Niu W, et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nature communications. 2015;6(1):1–11. pmid:25752243
  13. 13. De Rubeis S, He X, Goldberg AP, Poultney CS, Samocha K, Cicek AE, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–15. pmid:25363760
  14. 14. Pinto D, Delaby E, Merico D, Barbosa M, Merikangas A, Klei L, et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. The American Journal of Human Genetics. 2014;94(5):677–94. pmid:24768552
  15. 15. Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, et al. Most genetic risk for autism resides with common variation. Nature genetics. 2014;46(8):881–5. pmid:25038753
  16. 16. Krumm N, Turner TN, Baker C, Vives L, Mohajeri K, Witherspoon K, et al. Excess of rare, inherited truncating mutations in autism. Nature genetics. 2015;47(6):582–8. pmid:25961944
  17. 17. Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Reviews Genetics. 2010;11(6):415–25. pmid:20479773
  18. 18. Kumar RA, KaraMohamed S, Sudi J, Conrad DF, Brune C, Badner JA, et al. Recurrent 16p11. 2 microdeletions in autism. Human molecular genetics. 2008;17(4):628–38. pmid:18156158
  19. 19. Malhotra D, Sebat J. CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell. 2012;148(6):1223–41. pmid:22424231
  20. 20. Wiśniowiecka-Kowalnik B, Nowakowska BA. Genetics and epigenetics of autism spectrum disorder—current evidence in the field. Journal of applied genetics. 2019;60(1):37–47. pmid:30627967
  21. 21. Toma C. Genetic variation across phenotypic severity of autism. Trends in Genetics. 2020;36(4):228–31. pmid:32037010
  22. 22. Zhu X, Padmanabhan R, Copeland B, Bridgers J, Ren Z, Kamalakaran S, et al. A case-control collapsing analysis identifies epilepsy genes implicated in trio sequencing studies focused on de novo mutations. PLoS genetics. 2017;13(11):e1007104. pmid:29186148
  23. 23. Povysil G, Petrovski S, Hostyk J, Aggarwal V, Allen AS, Goldstein DB. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nature Reviews Genetics. 2019;20(12):747–59. pmid:31605095
  24. 24. Cirulli ET, Lasseigne BN, Petrovski S, Sapp PC, Dion PA, Leblond CS, et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science. 2015;347(6229):1436–41. pmid:25700176
  25. 25. Petrovski S, Todd JL, Durheim MT, Wang Q, Chien JW, Kelly FL, et al. An exome sequencing study to assess the role of rare genetic variation in pulmonary fibrosis. American journal of respiratory and critical care medicine. 2017;196(1):82–93. pmid:28099038
  26. 26. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. BioRxiv. 2019:531210.
  27. 27. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nature methods. 2010;7(4):248–9. pmid:20354512
  28. 28. Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research. 2012;40(W1):W452–W7. pmid:22689647
  29. 29. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome research. 2009;19(9):1553–61. pmid:19602639
  30. 30. Guo MH, Plummer L, Chan Y-M, Hirschhorn JN, Lippincott MF. Burden testing of rare variants identified through exome sequencing via publicly available control data. The American Journal of Human Genetics. 2018;103(4):522–34. pmid:30269813
  31. 31. Gallego-Martinez A, Requena T, Roman-Naranjo P, May P, Lopez-Escamez JA. Enrichment of damaging missense variants in genes related with axonal guidance signalling in sporadic Meniere’s disease. Journal of Medical Genetics. 2020;57(2):82–8. pmid:31494579
  32. 32. Georges A, Albuisson J, Berrandou TE, Dupre D, Lorthioir A, Escamard VD, et al. Rare Loss-of-Function Mutations of PTGIR identified in Fibromuscular Dysplasia and Spontaneous Coronary Artery Dissection. medRxiv. 2019:19012484.
  33. 33. Johnson JO, Chia R, Brown RH Jr, Landers JE. Mutations in the SPTLC1 gene are a cause of amyotrophic lateral sclerosis that may be amenable to serine supplementation. 2019.
  34. 34. Rashid M, van der Horst M, Mentzel T, Butera F, Ferreira I, Pance A, et al. ALPK1 hotspot mutation as a driver of human spiradenoma and spiradenocarcinoma. Nature communications. 2019;10(1):1–10.
  35. 35. Ulirsch JC, Verboon JM, Kazerounian S, Guo MH, Yuan D, Ludwig LS, et al. The genetic landscape of Diamond-Blackfan anemia. The American Journal of Human Genetics. 2018;103(6):930–47. pmid:30503522
  36. 36. Griswold AJ, Dueker ND, Van Booven D, Rantus JA, Jaworski JM, Slifer SH, et al. Targeted massively parallel sequencing of autism spectrum disorder-associated genes in a case control cohort reveals rare loss-of-function risk variants. Molecular autism. 2015;6(1):43.
  37. 37. Liu L, Sabo A, Neale BM, Nagaswamy U, Stevens C, Lim E, et al. Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet. 2013;9(4):e1003443. pmid:23593035
  38. 38. Bernier R, Golzio C, Xiong B, Stessman HA, Coe BP, Penn O, et al. Disruptive CHD8 mutations define a subtype of autism early in development. Cell. 2014;158(2):263–76. pmid:24998929
  39. 39. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009;25(14):1754–60. pmid:19451168
  40. 40. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20(9):1297–303. pmid:20644199
  41. 41. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome biology. 2016;17(1):122. pmid:27268795
  42. 42. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):s13742-015-0047-8.
  43. 43. Team RC. R: A language and environment for statistical computing. Vienna, Austria; 2013.
  44. 44. Craddock N, Owen MJ. The Kraepelinian dichotomy–going, going… but still not gone. The British Journal of Psychiatry. 2010;196(2):92–5.
  45. 45. Chérot E, Keren B, Dubourg C, Carré W, Fradin M, Lavillaureix A, et al. Using medical exome sequencing to identify the causes of neurodevelopmental disorders: Experience of 2 clinical units and 216 patients. Clinical genetics. 2018;93(3):567–76. pmid:28708303
  46. 46. Allen AS, Bellows ST, Berkovic SF, Bridgers J, Burgess R, Cavalleri G, et al. Ultra-rare genetic variation in common epilepsies: a case-control sequencing study. The Lancet Neurology. 2017;16(2):135–43.
  47. 47. Raghavan NS, Brickman AM, Andrews H, Manly JJ, Schupf N, Lantigua R, et al. Whole‐exome sequencing in 20,197 persons for rare variants in Alzheimer’s disease. Annals of clinical and translational neurology. 2018;5(7):832–42. pmid:30009200
  48. 48. Ganna A, Satterstrom FK, Zekavat SM, Das I, Kurki MI, Churchhouse C, et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. The American Journal of Human Genetics. 2018;102(6):1204–11. pmid:29861106
  49. 49. Cameron-Christie S, Wolock CJ, Groopman E, Petrovski S, Kamalakaran S, Povysil G, et al. Exome-based rare-variant analyses in CKD. Journal of the American Society of Nephrology. 2019;30(6):1109–22. pmid:31085678
  50. 50. Cirulli ET, White S, Read RW, Elhanan G, Metcalf WJ, Schlauch KA, et al. Genome-wide rare variant analysis for thousands of phenotypes in 54,000 exomes. BioRxiv. 2019:692368.
  51. 51. Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human molecular genetics. 2015;24(8):2125–37. pmid:25552646
  52. 52. Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature methods. 2010;7(8):575–6. pmid:20676075
  53. 53. Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic acids research. 2011;39(17):e118–e. pmid:21727090
  54. 54. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91. pmid:27535533
  55. 55. Yang J, Zhang Y. I-TASSER server: new development for protein structure and function predictions. Nucleic acids research. 2015;43(W1):W174–W81. pmid:25883148
  56. 56. Zhang C, Freddolino PL, Zhang Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information. Nucleic acids research. 2017;45(W1):W291–W9. pmid:28472402
  57. 57. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic acids research. 2016;44(D1):D733–D45. pmid:26553804
  58. 58. San Yeung K, Tso WWY, Ip JJK, Mak CCY, Leung GKC, Tsang MHY, et al. Identification of mutations in the PI3K-AKT-mTOR signalling pathway in patients with macrocephaly and developmental delay and/or autism. Molecular Autism. 2017;8(1):66.
  59. 59. Shang L, Henderson LB, Cho MT, Petrey DS, Fong C-T, Haude KM, et al. De novo missense variants in PPP2R5D are associated with intellectual disability, macrocephaly, hypotonia, and autism. Neurogenetics. 2016;17(1):43–9. pmid:26576547
  60. 60. Biswas D, Cary W, Nolta JA. PPP2R5D-Related Intellectual Disability and Neurodevelopmental Delay: A Review of the Current Understanding of the Genetics and Biochemical Basis of the Disorder. International journal of molecular sciences. 2020;21(4):1286. pmid:32074998
  61. 61. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine. 2015;17(5):405–23. pmid:25741868
  62. 62. Houge G, Haesen D, Vissers LE, Mehta S, Parker MJ, Wright M, et al. B56δ-related protein phosphatase 2A dysfunction identified in patients with intellectual disability. The Journal of clinical investigation. 2015;125(8):3051–62.