Figures
Abstract
Genome-wide association studies and next generation sequencing data analyses based on DNA information have identified thousands of mutations associated with autism spectrum disorder (ASD). However, more than 99% of identified mutations are non-coding. Thus, it is unclear which of these mutations might be functional and thus potentially causal variants. Transcriptomic profiling using total RNA-sequencing has been one of the most utilized approaches to link protein levels to genetic information at the molecular level. The transcriptome captures molecular genomic complexity that the DNA sequence solely does not. Some mutations alter a gene’s DNA sequence but do not necessarily change expression and/or protein function. To date, few common variants reliably associated with the diagnosis status of ASD despite consistently high estimates of heritability. In addition, reliable biomarkers used to diagnose ASD or molecular mechanisms to define the severity of ASD do not exist. Therefore, it is necessary to integrate DNA and RNA testing together to identify true causal genes and propose useful biomarkers for ASD. We performed gene-based association studies with adaptive test using genome-wide association studies’ (GWAS) summary statistics with two large GWAS datasets (ASD 2019 data: 18,382 ASD cases and 27,969 controls [discovery data]; ASD 2017 data: 6,197 ASD cases and 7,377 controls [replication data]) which were obtained from the Psychiatric Genomics Consortium (PGC). In addition, we investigated differential expression between ASD cases and controls for genes identified in gene-based GWAS with two RNA-seq datasets (GSE211154: 20 cases and 19 controls; GSE30573: 3 cases and 3 controls). We identified 5 genes significantly associated with ASD in ASD 2019 data (KIZ-AS1, p = 8.67 × 10-10; KIZ, p = 1.16 × 10-9; XRN2, p = 7.73 × 10-9; SOX7, p = 2.22 × 10-7; LOC101929229 also known as PINX1-DT, p = 2.14 × 10-6). Among these 5 genes, gene SOX7 (p = 0.00087) and LOC101929229 (p = 0.009) were replicated in ASD 2017 data. KIZ-AS1 (p = 0.059) and KIZ (p = 0.06) were close to the boundary of replication in ASD 2017 data. Genes SOX7 (p = 0.036 in all samples; p = 0.044 in white samples) indicated significant expression differences between cases and controls in the GSE211154 RNA-seq data. Furthermore, gene SOX7 was upregulated in cases than in controls in the GSE30573 RNA-seq data (p = 0.0017; Benjamini-Hochberg adjusted p = 0.0085). SOX7 encodes a member of the SOX (SRY-related HMG-box) family of transcription factors pivotally contributing to determining of the cell fate and identity in many lineages. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins leading to autism. Gene SOX7 in the transcription factor family could be associated with ASD. This finding may provide new diagnostic and therapeutic strategies for ASD.
Citation: Gonzales S, Zhao JZ, Choi NY, Acharya P, Jeong S, Wang X, et al. (2025) SOX7: Autism associated gene identified by analysis of multi-Omics data. PLoS One 20(5): e0320096. https://doi.org/10.1371/journal.pone.0320096
Editor: Chunyu Liu, State University of New York Upstate Medical University, UNITED STATES OF AMERICA
Received: January 5, 2024; Accepted: February 12, 2025; Published: May 15, 2025
Copyright: © 2025 Gonzales et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the article and its supporting information files. Other publicly available data can be found at: https://pgc.unc.edu/for-researchers/download-results/, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE211154, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Funding: This study was financially supported by the National Institutes of Health (NCATS R44TR003491 and NIDDK UH3DK119982) and the University of North Texas (Startup). This research was supported in part by the National Institutes of Health (NIH) through the National Human Genome Research Institute (NHGRI) award for theFIU-Center for Genome Research (Award Number: UG3HG013615-01, contact PI: Xuexia Wang, MPI: Stephen Black). The content is solely the responsibility of the authors and does not necessarily reflect the official views of the NIH.
Competing interests: Authors have declared that no competing interests exist.
Introduction
Autism spectrum disorder (ASD) is a heterogeneous grouping of neurodevelopmental traits which is diagnosed in roughly 1% of the world population [1]. ASD conditions are characterized by having attention-deficit hyperactivity disorder (ADHD), intellectual disability (ID), epilepsy, social communication deficits and restricted, repetitive, or unusual sensory-motor behaviors, or gastrointestinal problems [2]. A lot of research efforts have gone into understanding the causes of individual differences in autistic behavior. Twin and family studies strongly demonstrate that autism has a particularly large genetic basis, with estimated heritability ranging from 40% to 90% [3–6]. Molecular genetic studies revealed that the genetic risk for autism is shaped by a combination of rare and common genetic variants [7].
Over the past decade, genome-wide association studies (GWAS) and other type of genetic studies have identified increasing numbers of single nucleotide polymorphisms (SNPs) [8,9] and other forms of genetic variation that are associated with ASD [10]. It has been estimated that more than 100 genes and genomic regions are associated with autism [11,12]. While most of these studies focused on identifying heritable SNPs associated with ASD risk, other studies have demonstrated the influence of de novo mutations ranging from a single base [13,14] thousands to millions of bases long [15,16] to copy number variants (CNVs). Several likely gene-disruptive (LGD) variants in genes such as GRIK2 [17] and ASMT [18] affecting autism-risk were found exclusively or more frequently in individuals with autism compared to control groups. Jamain et al. [19] showed strong evidence suggesting that mutations in NLGN3 and NLGN4 are involved in ASD. Additionally, deletions at Xp22.3 that include NLGN4 have been reported in several autistic individuals. Roohi et al. [20] found out that CNTN4 plays an essential role in the formation, maintenance, and plasticity of neuronal networks. Disruption of CNTN4 is known to cause developmental delay and mental retardation. This report suggests that mutations affecting CNTN4 function may be relevant to ASD pathogenesis. A review by Li and Brown [21] discussed a substantial body of evidence has resulted from genome-wide screening for several widely studied candidate ASD genes. Similarly, a large-scale international collaboration was conducted to combine independent genotyping data to improve statistical power and aid in robust discovery of loci in GWAS [7]. This international collaboration also identified a significant genetic correlation between schizophrenia and autism with several neurodevelopmental related genes such as EXT1, ASTN2, MACROD2, and HDAC4. A combined analysis investigating both rare and common gene variants supported the evidence of the role of several genes/loci associated with autism (e.g., NRXN1, ADNP, 22q11 deletion) and revealed new variants in known autism-risk genes such as ADPNP, NRXN1, NINL, MECP2 and identified new compelling candidate genes such as KALRN, PLA2G4A, and RIMS4 [22]. Recently, Buxbaum [23] summarized the prevalence of some genetic variants in subjects ascertained for ASD.
Research investigating the gene expression profiles of those with ASD has also proven insightful genetic contributions to ASD. Expression levels of genes containing rare mutations associated with autism were evaluated in lymphoblasts from autism cases and controls, including aforementioned genes such as NLGN3, NLGN4, NRXN1, and MECP2. Out of these, NLGN3 was found to be differentially expressed along with SHANK3 [24]. More comprehensive gene expression analyses have confirmed susceptibility genes previously reported in GWAS-based analysis, identified novel differentially expressed genes, and biological pathways enriched for these genes [25]. RNA sequencing data analyses have elucidated several potential drivers of autism susceptibility, such as resting-state functional brain activity [26], dopaminergic influences in the dorsal striatum [27], overexpression of FOXP1, a gene involved in regulating tissue and cell type specific gene transcription in the brain [28,29], and genome-wide alterations to lncRNA levels, downregulation of alternative splicing events, and brain-region dependent alterations in gene expression [30]. Aforementioned studies indicate that integrating GWAS and RNA-seq data analysis can provide a better picture of the various underlying mechanisms behind a heterogeneous, multifaceted disorder like ASD.
In this study, we performed whole genome gene-based association tests for ASD with the adaptive test [31] using summary statistics from two large GWAS datasets which were obtained from the Psychiatric Genomics Consortium (PGC). We identified 5 genes significantly associated with ASD in ASD 2019 data. Among these 5 genes, gene SOX7 was replicated in ASD 2017 data. Furthermore, two RNA sequencing data analyses indicated that gene SOX7 was significantly upregulated in cases as compared to controls. SOX7 encodes a member of the SOX (SRY-related HMG-box) family of transcription factors pivotally contributing to determining of the cell fate and identity in many lineages. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins leading to autism.
Materials and methods
Datasets
Discovery GWAS summary statistics: The discovery dataset (labeled as asd2019) includes summary statistics from a meta-analysis of European samples derived from two cohorts: a population-based case control study from the Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH) project and a family trio-based study from the Psychiatric Genomics Consortium (PGC) [8]. The iPSYCH samples included individuals born to a known mother and a resident of Denmark at the time of their first birthday. Cases were identified using the Danish Psychiatric Central Research Register, using diagnoses from 2013 or earlier by psychiatrist according to diagnostic code ICD10, which includes diagnoses of childhood autism, atypical autism, Asperger’s syndrome, “other pervasive developmental disorders”, and “pervasive developmental disorder, unspecified” [8]. The PGC samples consisted of 5 cohorts, whose trios were analyzed as cases and pseudo-controls. Details regarding these studies can be found in [8] and [7]. The combined sample size consisted of 18,382 cases and 27,969 controls. Imputation and quality control were performed via PGC’s Ricopili pipeline, which ensures to produce robust, reproducible, and comparable datasets. The iPSYCH samples were processed separately in 23 genotyping batches, while the PGC samples were processed separately for each study. Genotype imputation was performed with IMPUTE2/SHAPEIT [32,33] in the Ricopili pipeline using the 1000 Genomes Project phase 3 dataset as the reference set. Regions demonstrating high linkage disequilibrium were excluded, and one of high similarity pairs of subjects identified by PLINK’s identity by state (IBS) analysis [34] were reduced at random, with a preference for retaining cases. Association was performed using PLINK on imputed dosage data and the meta-analysis was performed using METAL [8]. More detailed descriptions of each stage of the analysis can be found in Grove et al [8]. The summary statistics produced by this study and subsequently used for our analysis can be found at https://pgc.unc.edu/for-researchers/download-results/.
Replication GWAS summary statistics: The replication dataset (labeled as asd2017) includes summary statistics from a European-ancestry meta-analysis performed by the Autism Spectrum Disorders Working Group (AWG) of The Psychiatric Genomics Consortium (PGC), which aimed at improving statistical power to detect loci significantly associated with ASD. The meta-analysis was performed on data from 14 independent cohorts across different ancestries totaling over 16,000 individuals. For each step in the meta-analysis, each cohort was processed individually. Individuals were excluded if they were assessed at less than 36 months of age or if diagnostic criteria were not met from the Autism Diagnostic Interview-Revised (ADI-R) or the Autism Diagnostic Observation Schedule (ADOS) domain scores. While a “world-wide” meta-analysis on this aggregate dataset was performed, we derive our replication dataset based on the smaller European-only analysis consisted of 6,197 ASD cases and 7,377 controls [7]. Each stage of the imputation and quality control was performed similarly as the asd2019 data: Imputation and quality control on PGC samples were performed following the PGC’s “Ricopili” pipeline. Since multiple studies were involved, necessary studies were performed to check for and remove duplicate individuals prior to imputation. Family trio-based data was organized as case and pseudo-controls. Criteria for SNP retention and other pre-imputation quality control steps can be found in the study’s supplementary File 1 [7]. Genotype imputation was performed with IMPUTE2/SHAPEIT using the 2,184 phased haplotypes from the full 1000 Genomes Project dataset as the reference set. All 14 cohorts were tested for association individually using an additive logistic regression model in PLINK. More detailed information about each stage of the analyses performed by this study can be found in the study’s supplementary File 1 [7]. The resulting summary statistics which were utilized in our analysis can be found at https://pgc.unc.edu/for-researchers/download-results/.
RNA-Seq data of brain tissue (GSE211154): Human postmortem brain tissue of each individual in a cohort of postmortem ASD cases and controls was obtained from the University of Maryland Brain and Tissue Bank, a brain and tissue repository of the NIH Neurobiobank. All ASD cases had confirmed diagnoses through Autism Diagnostic Interview-Revised (ADI-R) scores and/or received a clinical diagnosis of autism from a licensed psychiatrist. Controls were collected based on age and postmortem interval matched with each case [35].
RNA sequencing was conducted at the John P. Hussman Institute for Human Genomics Center for Genome Technology, University of Miami. Samples with RNA integrity (RIN) scores ≥4 were included for library preparation and sequencing. Total RNA was prepared using the Ovation SoLo RNA-Seq Library Preparation Kit (Tecan Life Science, Mannedorf, Switzerland). Sequencing was performed on the Illumina NovaSeq 6,000 (Illumina, San Diego, CA) with single end 100 bp reads targeting 25 million reads per sample. Overall, 39 samples (20 cases and 19 controls) were obtained for our analysis. S1 Table demonstrated the characteristics of the 39 samples. The study can be found in the Gene Expression Omnibus (GEO) database, under accession number GSE211154.
RNA-Seq data of brain tissue (GSE30573): The RNA dataset was obtained from a gene co-expression analysis which aimed to identify modules of co-expressed genes associated with ASD [36]. The study can be found in the Gene Expression Omnibus (GEO) database, under accession number GSE30573. Detailed descriptions of the raw data acquisition and quality control processes can be found in the supplementary information of [36] as well as the GEO accession viewer. Briefly, brain tissue samples (frontal cortex, temporal cortex, and cerebellum) were obtained from the Autism Tissue Project (ATP) and the Harvard Brain Bank. Cases were diagnosed using ADI-R diagnostic scores, which can be found along with other clinical data upon request from the ATP website. Total RNA was extracted from the sample tissues following the Qiagen miRNA kit instructions. Quality and concentration were assessed by Agent Bioanalyzer and Nanodrop, respectively. Reads were generated using Illumina GAII sequencer using manufacturer settings and were 73–76 nucleotides in length. Raw sequencing data for the frontal and temporal cortex samples were available in the SRA run selector for 6 autism cases and 6 controls [36].
eQTL data: Top expression quantitative trait loci (eQTLs) for multiple tissues were obtained from the Gene-Tissue Expression (GTEx) portal, using the v8 release [37]. RNA-seq analysis was performed using STAR v2.5.3a for alignment and RSEM v1.3.0 for quantification. Genes were selected based on expression thresholds of >0.1 TPM in at least 20% of samples and ≥6 reads in at least 20% of samples. For each gene, expression values were normalized across samples using an inverse normal transformation. Genotype data was generated using whole genome sequencing for the subsequent eQTL analysis, which excluded variants with MAF < 1%. A total of 49 tissues across 838 donors were analyzed for eQTL associations as part of the GTEx experiments, however only 30 tissues contained a significant eQTL for SOX7. Cis-eQTL mapping was performed using FastQTL: full details regarding parameter specifications can be found on the GTEx portal website (https://gtexportal.org/home/methods). A majority of donors were white males aged 50–70. A full breakdown of donor characteristics can be found on the GTEx portal, under documentation (https://gtexportal.org/home/tissueSummaryPage#donorInfo).
Quality control & preprocessing
GWAS summary data: After downloading the raw summary statistics from the PGC website, we performed quality control analysis to ensure robust and quality results. Only SNPs on autosomal chromosomes were used. First, SNPs with an imputation information metric (INFO) score below 0.9 were removed. Next, SNPs with strand-ambiguous alleles or non-biallelic loci were removed as well as SNPs with duplicate rs IDs. Z score was then calculated using each variant’s odds ratio and standard error using the equation . After quality control, the raw variants were sorted into hg19 RefSeq genes. Linkage disequilibrium (LD) within each gene was calculated using the 1000 Genomes European reference panel (phase 3): For each gene, a subset of the GWAS variants
of the gene’s transcription start site and transcription end site were matched to the reference variants, ensuring both used the same reference allele and flipping Z score signs if necessary. Genes that contained less than 2 SNPs were removed. The Pearson’s correlation between this subset of genotypes was calculated and used as the gene-wide LD. One SNP of a pair of SNPs with perfect correlation (
within a gene was removed. The processed data was saved in 22 ‘RData’ files (one for each chromosome) containing a list of data-frames, where each list element comprised of 1) SNP information for a specific gene and 2) its corresponding LD matrix.
RNA-seq data of brain tissue (GSE211154): Raw FASTQs were processed through a bioinformatics pipeline including adapter trimming by TrimGalore (https://githubcom/FelixKrueger/TrimGalore), alignment with the STAR package [35,38] to the GRCh38 human reference genome. Gene expression counts were quantified against the GENCODE v35 human gene release using the GeneCounts function in STAR. We downloaded gene expression counts directly from the GSE211154 webpage (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE211154) [35].
RNA-seq data of brain tissue (GSE30573): The sequence read archive (SRA) accession list and associated sample metadata (“SRA Run Table”) for GSE30573 were downloaded from the SRA run selector page for the study. Raw fastq files were downloaded from the SRA using the SRA Toolkit via the ‘prefetch’ and ‘fastq-dump’ commands [39]. We used FastQC to assess the quality of reads in each file, and MultiQC to visualize the results in batch format [40,41]. Only 1 sample failed the ‘per sequence base quality’ assessment and was subsequently trimmed of low-quality reads using the command-line tool ‘fastq_quality_filter’ from the FastX toolkit using a minimum quality Phred score of 20 and a minimum percent of bases per read to meet that threshold of 50% [42]. Reassessment via FastQC demonstrated this as sufficient trimming to meet the quality needed for downstream analysis.
After passing quality control, RNA-seq reads were aligned to a reference genome using STAR [38] by following two steps: genome indexing and the alignment to the indexed reference genome. We generated the genome index files using STAR’s –genomeGenerate flag and setting –sjdbOverhang to 75 to match the maximum read length - 1 across the samples. The reference genome FASTA file and corresponding annotation GTF file (GChr37/hg19, release 41) used to generate these index files were downloaded from GENCODE. After alignment, we used HTSeq [43] to estimate the number of reads per gene region (i.e., gene expression counts).
GTEx eQTL data: Preprocessing of downloaded eQTL data from GTEx (current release v8) was performed as part of the standard workflow in the R package “TwoSampleMR” [44,45]. Briefly, the top eQTL per tissue was matched to its corresponding SNP in the ASD2019 GWAS data, palindromic SNPs that could not be inferred via MAF as well as strand ambiguous SNPs were removed from the analysis. Additionally, effect direction was harmonized with the GWAS outcome data to ensure the same allele is referenced in both data sets.
Statistical analysis
Gene-based Association Test: To perform gene-based association testing, we used the function ‘sats’ in the R package ‘mkatr’ [31]. This function computes p-values for 3 different gene based testing methods using GWAS summary statistics with an LD matrix calculated from a reference panel. A brief description of each method is as follows: Let m denote the number of variants considered in a gene or gene region and let represent the GWAS summary statistics for that region. Let
denote the estimated correlation matrix between Z statistics based on variant linkage disequilibrium (LD) calculated from a reference panel [31]. The tests included in the sats function are the sum test (a type of burden test), the squared sum test (a type of SKAT statistic) and the adaptive test (similar to the SKAT-O statistic). The three tests are as follows:
- Sum test (ST):
- Squared sum test (S2T):
- Adaptive test (AT):
denotes its p-value.
It can be shown that asymptotically follows a weighted sum of independent chi-squared distribution with 1 degree freedom [
(
)] whose weights equal the eigenvalues of
, allowing for efficient computation of p value of
,
. The minimum p-value of AT is searched for over a range of
in the interval [0, 1] [31].
The ST is most valuable when all variants have the same direction of effect and approximately equal effect size for each genetic variants in the gene being tested, while the S2T will perform better than ST when genetic variants have different directions of effects. AT utilizes information from both ST and S2T, meaning AT can adapt to the genetic variants in the data better than ST or S2T alone. Indeed, the adaptive test shows the most robust performance across a wider range of scenarios [31]. Therefore, we report the results of AT in our gene-based genome-wide studies with GWAS summary statistics in this paper. More details regarding the derivation of these tests and their relation to the single-variant association test can be found in [31].
Differential Gene Expression Analysis with GSE211154 data: Gene expression of each gene was dichotomized as low expression (gene expression count ≤ median) and high expression (gene expression counts > median). We treated the binary variable of gene expression as a predictor variable and used multiple logistic regression model to conduct a gene expression differential analysis for each gene by adjusting for age at death, sex, race, and postmortem interval (PMI) on the basis of previous evidence [35] about the association of these variables with ASD (model 1).
In model 1, p is the probability of ASD; GE is the dichotomized gene expression count of each gene; PMI is the postmortem interval in hours. When the gene expression differential analysis is restricted to white participants, we excluded race from model 1.
Differential Gene Expression Analysis with GSE30573 data: For genes with gene expression counts at least 10, we used the R package DESeq2 [46] to perform differential gene expression analysis based on normalized gene expression counts. DESeq2 uses a generalized linear model to model the relationship between a trait and gene expression [46]. We used the Benjamini-Hochberg adjusted p-value to assess significance in gene differential analysis to control desired false discovery rate (FDR).
Causal Inference with Two-Sample Mendelian Randomization: To assess a potential causal effect of SOX7 expression on ASD, we use the R package ‘TwoSampleMR’ to conduct 2-sample mendelian randomization using SOX7 eQTLs obtained from GTEx as instruments [44,45]. Exposure and outcome data were harmonized prior to analyses to ensure the effect allele refers to the same allele in both datasets. We performed single-instrument analysis, using the Wald ratio method to estimate effects of SOX7 expression in various tissues on ASD outcome. Thirty tissues were tested individually. A Bonferroni corrected threshold of was used to claim significance. Additionally, while the full GTEx data is the majority white donors (84.6%), the v8 release includes a separate, European only subset which contains 29 tissues with a significant SOX7 eQTL. We performed a secondary analysis using this dataset, since our GWAS dataset is European only.
Computing Environment: RNA-seq data quality control, alignment, and counts were processed on the lonestar6 high-performance cluster provided by TACC at the University of Texas at Austin. Differential expression analysis and gene-based association tests were performed in a local Linux (Windows Linux Subsystem) environment using R (R-4.3.2) in RStudio. Two-sample mendelian randomization was performed in Windows 10, using R (R-4.3.3) in RStudio.
Results
Gene-based association test
Discovery GWAS: Out of approximately 19,000 genes tested for association with ASD, 5 genes were identified as significant with Bonferroni corrected p-values less than (Fig 1 and Table 1). SOX7 (
) encodes a transcription factor involved in regulating embryonic development and cell fate determination [47]. KIZ (
) encodes “Kizuna centrosomal protein”, which plays a central role in stabilizing the pericentriolar region before the spindle formation step in cellular division [48]. A gene region which encodes long non-coding antisense RNA for KIZ, KIZ-AS1 (
) was also identified as significant, however the function of this antisense RNA has not been determined. XRN2 (
) encodes a 5’-3’ exoribonuclease which is pertinent in promoting transcriptional termination [49]. Finally, LOC101929229 (
), also known as PINX1-DT, is a lncRNA that is considered a “divergent transcript” of the protein coding gene PINX1. While the divergent transcript function is not defined, PINX1 encodes a protein that enables telomerase RNA binding and inhibitor activity and is involved in several related processes, including DNA biosynthesis and protein localization [50].
Gene-based associated tests with the adaptive test are expressed as -log10 (p-value) on Y-axis. Chromosome 1–22 are labeled on X-axis. Each dot represents a gene tested for association with autism spectrum disorder; the dotted horizontal line represents a Bonferroni corrected p-value threshold of 2.5x10-6.
Replication GWAS: Among these 5 genes, gene SOX7 (=0.00087), and LOC101929229 (
=0.009) were replicated in ASD 2017 data. Gene KIZ-AS1 (
=0.059) and KIZ (
=0.06) were close to the boundary of replication in ASD 2017 data (Fig 1 and Table 1).
Differential expression analysis in GSE30573
Among the five genes identified in the discovery of GWAS, gene SOX7 (log2 Fold Change [LFC] = 1.17, ; Benjamini-Hochberg (BH) adjusted
= 0.0085), LOC101929229 (LFC = 3.22, p = 5.83 × -7, adjusted p = 1.18 × -5), KIZ (LFC = 0.63,
=0.00099, BH adjusted
=0.0055) were also identified as significant in the differential gene expression analysis (Table 1). A comparison of case-control gene expression counts for SOX7 can be found in Fig 2a, demonstrating that SOX7 is consistently upregulated in autism cases compared to controls. The expression of SOX7 is increased in autism patients relative to controls by a multiplicative factor of 2.25. In addition, the expression of LOC101929229 is increased in autism patients than in controls by a multiplicative factor of 9.31.
Differential expression analysis in GSE211154
Among the five genes identified in the discovery of GWAS, only gene SOX7 demonstrated significantly elevated expression in ASD cases than in controls no matter in all samples (p = 0.036) or white samples (p = 0.044) (Tables 1 and 2). A comparison of case-control gene expression counts for SOX7 can be found in Fig 2b, demonstrating that SOX7 is consistently upregulated in autism cases compared to controls.
Two-sample Mendelian randomization
Out of 30 tissues tested, SOX7 expression in 6 tissues were statistically significant after correcting for multiple testing in the full dataset, including 3 subregions in the brain: cerebellar hemisphere (), hypothalamus (
), and spinal cord (
) (Table 3). Interestingly, in the European only subset, the cerebellar hemisphere remained significant (
) (Table 4). Full results for both analyses can be found in S2 and S3 Tables, respectively. These results demonstrate a causal relationship between SOX7 expression and ASD.
Discussion
Through gene-based analysis, we identified 5 gene regions (KIZ, KIZ-AS1, XRN2, LOC101929229, and SOX7) significantly associated with ASD. gene SOX7 and LOC101929229 were supported by results from the replication study in a different GWAS data and the differential gene expression analysis performed on publicly available RNA-seq data.
KIZ is located on chromosome 20, and encodes Kizuna centrosomal protein, which aids in stabilizing the pericentriolar region of centrosomes before spindle formation. KIZ has been identified as significantly associated with autism in previous GWAS [8], TWAS [51], gene based analysis [52], and methylation-based studies [53], and the involvement of cell cycle regulation in autism susceptibility has also been implicated in previous research [54,55]. KIZ has also been found to be a potentially shared genetic loci between ASD and attention-deficit hyperactivity disorder (ADHD), providing support for its involvement in neurological disorders [56].
XRN2 is located next to KIZ and encodes a 5’-3’ exonuclease that is involved in myriad RNA management processes, including transcriptional termination, miRNA expression regulation, nonsense-mediated mRNA decay, and rRNA maturation [57–60]. XRN2 has been found to play a role in regulating miRNA expression in neurons specifically, and altered miRNA expression regulation has been investigated as a potential mechanism for autism susceptibility [61–65]. Likewise, disruption of proper RNA metabolism as a result of altered expression of RNA binding proteins has been implicated in neurological disease as a whole, and the XRN gene family is involved in nonsense-mediated decay of mRNA, a process that has been implicated in autism pathophysiology [66,67]. Previous GWAS have reported SNPs in the region containing XRN2 to be significantly associated with ASD, affirmed by gene-based analysis using MAGMA [8]. Additionally, a transcriptome-wide association study (TWAS) found XRN2 to be significantly upregulated in autism, in accord with our findings [68]. Another gene-based analysis found XRN2 to be associated with ASD and upon further investigation via gene-network analysis and enrichment analysis found that not only does XRN2 interact with several genes in the cAMP signaling pathway and RNA transport network, but that the enriched KEGG/GO terms for XRN2 (spliceosome, RNA transport, and nucleic acid binding) found to be associated with ASD are also essential processes pivotal to early development [52]. The extensive involvement of XRN2 in such complex mechanisms of gene expression regulation, particularly in neuronal cell types, offers possible insights into the vast heterogeneity of ASD and its overlap with other neurodevelopmental disorders. In fact, more recent research efforts have focused on ascertaining genetic commonalities between ASD and related disorders such as ADHD, obsessive compulsive disorder (OCD), and Tourette Syndrome, of which XRN2 seems to be a shared significant locus [69,70].
SOX7 is of particular interest due to its hallmark involvement in the regulation of Wnt/-catenin pathway (Fig 3), an important developmental signaling pathway. SOX7 and its related SOX family genes encode transcription factors that are critical to the downregulation of the canonical Wnt/
-catenin signaling pathway, which controls embryonic development, adult homeostasis, and is involved in a multitude of cellular processes [71,72]. While the Wnt pathway is ubiquitous to nearly all tissue types, proteins involved in Wnt signaling in the brain specifically have been found to localize in the synapses and influence synaptic growth, and knockout murine models of ASD risk genes that are a part of the Wnt pathway have provided support for the disruption of this pathway in autism-like behaviors [73]. Indeed, the Wnt/
-catenin signaling pathway has been suggested as a possible avenue for autism pathogenesis in several studies [73–79].
SOX7 also regulates angiogenesis, vasculogenesis, and endothelial cell development, and the SOX family of transcription factors are critical to cardiovascular development [80,81]. For example, SOX7 was found to be upregulated in sustained hypoxic environments, mediating angiogenesis [82], and a knockout model of SOX7 was found to result in profound vascular defects, and demonstrated that SOX7 has an essential role in vasculogenesis and angiogenesis in early development [83]. Links between SOX7’s role in developmental delay and congenital heart disease have been investigated. Specifically, deletions in the region where SOX7 resides have been demonstrated to simultaneously cause congenital heart defects and intellectual disability [84,85].
Additionally, Wnt signaling has been demonstrated to orchestrate differentiation of neural vasculature, such as the blood-brain barrier [86,87]. Likewise, there is evidence of vascular involvement in the development of autism [88–91]. One review in particular suggests that mutations affecting the delicate interactions between Wnt signaling and Shh pathways may alter blood brain barrier integrity in autism by aberrantly interacting with neurovascular molecules [92].
Lastly, oxidative stress has been researched as a potential source of autism susceptibility [93,94], and the interaction between altered vasculature and autism during oxidative stress could point to another potential source of pathogenesis [88]. Indeed, the role of Wnt/-catenin signaling in oxidative stress has been implicated in autism susceptibility directly [95]. This combination of evidence that implicates both Wnt signaling and SOX7 interactions in the multitude of interrelated processes that have been suggested as mechanisms behind the etiology of ASD, supplemented by our findings, provide ever-mounting support for more in-depth investigations of these particular genes and pathways.
Wnt/-catenin, oxidative stress, and impaired/altered vasculature have all been implicated in the development of ASD. These three factors are involved with each other and multiple systemic processes, which may contribute to ASD’s symptom heterogeneity. The fact that SOX7 is involved in the regulation of Wnt/
-catenin and vasculogenesis points to a potential converging mechanism behind the pathophysiology of ASD. Additionally, the association of SOX7 with autism has been investigated directly. A case study involving a child patient exhibiting “8p23.1 duplication syndrome”, revealed a de novo 1.81 Mbp duplication event on chromosome 8 (8p23.1), spanning the region where SOX7 lies [96]. This patient exhibited characteristic symptoms of the disorder, including delay of motor and speech development and intellectual disability, which heavily overlap with autism and related intellectual disorders. Indeed, this patient also exhibited symptoms specific to ASD, such as repetitive compulsive behavior.
A GWAS performed in a Mexican population found that SOX7 was differentially methylated between autism cases and controls [97]. Another study also found that differential methylation was associated with an “elevated polygenic burden” for autism, and further identified that two significantly associated CpG sites were located near GWAS markers for autism on chromosome 8, in the same region as SOX7 [53]. It is worth noting that this study also found evidence of SNPs associated with both autism and DNA methylation that were annotated to KIZ and XRN2, two genes that we also found to be significantly associated with ASD.
Changes in methylation lead to changes in gene expression, providing another plausible mechanism of SOX7 involvement: a change in SOX7 methylation affects the expression and thus availability of the transcription factor it encodes, which has a downstream effect on the subsequent pathways SOX7 regulates, such as Wnt/-catenin. Indeed, both methylation studies demonstrated a negative difference in methylation between autism cases and controls. Generally speaking, undermethylation results in a less compact 3-dimensional genome structure, allowing for greater access to the gene and an increase in expression, which we see evidence of in the higher gene expression counts in autism cases versus controls in our RNA-seq data (Fig 2a and 2b) [98–100].
Finally, altered expression of SOX7 has been shown to play a role in the development of different types of gliomas. One study demonstrated SOX7 to be downregulated in human glioma, allowing cancer development through upregulated Wnt/-catenin signaling [101], whereas another study demonstrated that overexpression of SOX7 in high-grade glioma (HGG) promoted cancer development by promoting tumor growth via vessel abnormalization [102]. These somewhat conflicting observations demonstrate that, due to its heavy involvement in regulating several intricately linked developmental and homeostatic functions, SOX7 expression must be delicately balanced. Interestingly, it has also been demonstrated that there is extensive overlap of genetic risk between autism and cancer [103–106]. SOX7 expression and its interactions may provide additional support to this conjecture, particularly due to its role in vasculature development and Wnt signaling regulation.
Alterations in cerebellar function and structure have previously been implicated in autism pathophysiology [107–110]. While the cerebellum is mostly known for its role in coordination and motor function, increasing evidence demonstrates the cerebellum is also involved in a variety of social and cognitive functions as well [111–113], which adds evidence to the idea that deficiencies in cerebellar function may be one of the many contributing factors to the highly heterogenous nature of social and cognitive symptoms of ASD. Differences in cerebellar volume have been observed between ASD and neurotypical individuals [114–117], and SOX7 has been implicated in neuronal apoptosis in the cerebellum [118]. Increased blood brain barrier permeability was identified in the cerebellum of a murine model of ASD [119], and connections between changes in blood brain barrier permeability, altered vasculature, blood flow, and oxidative stress in the brain, including the cerebellum specifically, have been reviewed as possible interrelated mechanisms for ASD [120].
The methods performed are not without limitations. Gene expression is a very dynamic process that is not only tissue dependent, but also cell type specific and varies depending developmental stage and even external factors [121–126]. Certainly, these factors affecting genetic expression means that any autism-related genes which are differentially expressed at different development stages or other varying contexts may be missed. Additionally, differential expression analysis was performed on bulk-RNA, whereas it is possible that altered gene expression between autism cases and controls is cell-type specific; knowing the specifics of the expression state of specific cell types that make up key areas of the brain have a better chance of revealing mechanisms behind autism pathogenesis as well as possibly elucidate the pathophysiology behind the vast variety of ASD subtypes. Gene-based analysis also has some limitations, the most important being the reliance on a reference population for estimating linkage disequilibrium between variants. The similarity of this reference population to the population of study is crucial to the accuracy of many gene-based analyses including those performed here. Analyses using two sample Mendelian randomization also suffer from this limitation. As a result, the extent of our findings is limited to European populations, as this was our reference of choice. Future steps include a tighter integration of DNA and RNA information as well as extensions to non-European populations that have been under-researched.
These limitations notwithstanding, the study has considerable strengths. The AT method used in the gene based GWAS study can not only integrate the good properties of sum and squared sum tests but also consider LD information among genetic variants. The heatmap of the correlation between genetic variants in SOX7 (S1 Fig) indicates that rs7005905 and rs7836366, rs10100209 and rs7836366, and rs10100209 and rs7005905 have strong positive linkage disequilibrium (LD) (ρ > 0.5); rs4841432 has negative LD with other variants except for rs7009920. The strong LD in SOX7 and the powerful AT method warrant our identification of the autism associated gene SOX7. The successful replications of SOX7 in the replication data, gene expression data, and the associated biological plausibility underscores the robustness of the finding of the connection between SOX7 and autism. This finding may significantly advance our understanding of the etiology of autism, open new opportunities to reinvigorate the stalling autism drug development and increase the accuracy of risk prediction of autism which makes autism early intervention and prevention being possible.
Supporting information
S1 Fig. Heatmap of the correlation between variants in SOX7.
SNP rs7005905 and rs7836366, rs10100209 and rs7836366, and rs10100209 and rs7005905 have strong positive linkage disequilibrium (LD) (ρ > 0.5); rs4841432 has negative LD with other variants except for rs7009920.
https://doi.org/10.1371/journal.pone.0320096.s001
(TIF)
S1 Table. Characteristics of participants and gene SOX7 expression status in GSE211154 RNA-seq data.
https://doi.org/10.1371/journal.pone.0320096.s002
(DOCX)
S2 Table. 2-SMR results between ASD (Outcome) and SOX7 expression (Exposure).
https://doi.org/10.1371/journal.pone.0320096.s003
(DOCX)
S3 Table. 2-SMR results between ASD (Outcome) and SOX7 expression (Exposure) (EUR GTEx samples onl.
https://doi.org/10.1371/journal.pone.0320096.s004
(DOCX)
Acknowledgments
The study was peer reviewed and selected as a poster presentation at the 2023 American Society of Human Genetics annual meeting. We’re grateful to our peers for engaging in invaluable discussions regarding our work.
References
- 1. Fombonne E. Epidemiology of pervasive developmental disorders. Pediatr Res. 2009;65(6):591–8. pmid:19218885
- 2. Gillberg C, Fernell E, Minnis H. Early symptomatic syndromes eliciting neurodevelopmental clinical examinations. Hindawi. 2014.
- 3. Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, et al. Most genetic risk for autism resides with common variation. Nat Genet. 2014;46(8):881–5. pmid:25038753
- 4. Devlin B, Kelsoe JR, Sklar P, Daly MJ, O’Donovan MC, Craddock N, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genetics. 2013;45(9):984–94.
- 5. Tick B, Bolton P, Happé F, Rutter M, Rijsdijk F. Heritability of autism spectrum disorders: a meta-analysis of twin studies. J Child Psychol Psychiatry. 2016;57(5):585–95. pmid:26709141
- 6. Sandin S, Lichtenstein P, Kuja-Halkola R, Hultman C, Larsson H, Reichenberg A. The Heritability of Autism Spectrum Disorder. JAMA. 2017;318(12):1182–4. pmid:28973605
- 7. Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol Autism. 2017;8:21. pmid:28540026
- 8. Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, et al. Identification of common genetic risk variants for autism spectrum disorder. Nature genetics. 2019;51(3):431–44.
- 9. Robinson EB, St Pourcain B, Anttila V, Kosmicki JA, Bulik-Sullivan B, Grove J, et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat Genet. 2016;48(5):552–5. pmid:26998691
- 10. Bourgeron T. From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nat Rev Neurosci. 2015;16(9):551–63. pmid:26289574
- 11. Sanders SJ, He X, Willsey AJ, Ercan-Sencicek AG, Samocha KE, Cicek AE, et al. Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci. Neuron. 2015;87(6):1215–33. pmid:26402605
- 12. Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An JY. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell. 2020;180(3):568–84.
- 13. Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, Willsey AJ, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485(7397):237–41. pmid:22495306
- 14. O’Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485(7397):246–50. pmid:22495309
- 15. Sanders SJ, Ercan-Sencicek AG, Hus V, Luo R, Murtha MT, Moreno-De-Luca D, et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron. 2011;70(5):863–85. pmid:21658581
- 16. Levy D, Ronemus M, Yamrom B, Lee Y, Leotta A, Kendall J, et al. Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron. 2011;70(5):886–97. pmid:21658582
- 17. Jamain S, Betancur C, Quach H, Philippe A, Fellous M, Giros B, et al. Linkage and association of the glutamate receptor 6 gene with autism. Mol Psychiatry. 2002;7(3):302–10. pmid:11920157
- 18. Melke J, Goubran Botros H, Chaste P, Betancur C, Nygren G, Anckarsäter H, et al. Abnormal melatonin synthesis in autism spectrum disorders. Mol Psychiatry. 2008;13(1):90–8. pmid:17505466
- 19. Jamain S, Quach H, Betancur C, Råstam M, Colineaux C, Gillberg IC, et al. Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4 are associated with autism. Nat Genet. 2003;34(1):27–9. pmid:12669065
- 20. Roohi J, Montagna C, Tegay DH, Palmer LE, DeVincent C, Pomeroy JC, et al. Disruption of contactin 4 in three subjects with autism spectrum disorder. J Med Genet. 2009;46(3):176–82. pmid:18349135
- 21. Li X, Zou H, Brown WT. Genes associated with autism spectrum disorder. Brain Res Bull. 2012;88(6):543–52. pmid:22688012
- 22. Leblond CS, Cliquet F, Carton C, Huguet G, Mathieu A, Kergrohen T, et al. Both rare and common genetic variants contribute to autism in the Faroe Islands. NPJ Genom Med. 2019;4:1. pmid:30675382
- 23. Buxbaum JD. Multiple rare variants in the etiology of autism spectrum disorders. Dialogues Clin Neurosci. 2022.
- 24. Yasuda Y, Hashimoto R, Yamamori H, Ohi K, Fukumoto M, Umeda-Yano S, et al. Gene expression analysis in lymphoblasts derived from patients with autism spectrum disorder. Mol Autism. 2011;2(1):9. pmid:21615902
- 25. Rahman MR, Petralia MC, Ciurleo R, Bramanti A, Fagone P, Shahjaman M, et al. Comprehensive Analysis of RNA-Seq Gene Expression Profiling of Brain Transcriptomes Reveals Novel Genes, Regulators, and Pathways in Autism Spectrum Disorder. Brain Sci. 2020;10(10):747. pmid:33080834
- 26. Berto S, Treacher AH, Caglayan E, Luo D, Haney JR, Gandal MJ, et al. Association between resting-state functional brain connectivity and gene expression is altered in autism spectrum disorder. Nat Commun. 2022;13(1):3328. pmid:35680911
- 27. Brandenburg C, Soghomonian J-J, Zhang K, Sulkaj I, Randolph B, Kachadoorian M, et al. Increased Dopamine Type 2 Gene Expression in the Dorsal Striatum in Individuals With Autism Spectrum Disorder Suggests Alterations in Indirect Pathway Signaling and Circuitry. Front Cell Neurosci. 2020;14:577858. pmid:33240045
- 28. Chien W-H, Gau SS-F, Chen C-H, Tsai W-C, Wu Y-Y, Chen P-H, et al. Increased gene expression of FOXP1 in patients with autism spectrum disorders. Mol Autism. 2013;4(1):23. pmid:23815876
- 29. Ferland RJ, Cherry TJ, Preware PO, Morrisey EE, Walsh CA. Characterization of Foxp2 and Foxp1 mRNA and protein in the developing and mature brain. J Comp Neurol. 2003;460(2):266–79. pmid:12687690
- 30. Parikshak NN, Swarup V, Belgard TG, Irimia M, Ramaswami G, Gandal MJ, et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature. 2016;540(7633):423–7. pmid:27919067
- 31. Guo B, Wu B. Statistical methods to detect novel genetic variants using publicly available GWAS summary data. Comput Biol Chem. 2018;74:76–9. pmid:29558699
- 32. Bryan N, Howie PD, Jonathan M. A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies. PLOS Genetics. 2009.
- 33. Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2011;9(2):179–81. pmid:22138821
- 34. Chang C, Chow C, Tellier L, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4(1).
- 35. Brandenburg C, Griswold AJ, Van Booven DJ, Kilander MBC, Frei JA, Nestor MW, et al. Transcriptomic analysis of isolated and pooled human postmortem cerebellar Purkinje cells in autism spectrum disorders. Front Genet. 2022;13:944837. pmid:36437953
- 36. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature. 2011;474(7351):380–4. pmid:21614001
- 37. Consortium GT. The gtex consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–30.
- 38. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. pmid:23104886
- 39.
SRA Toolkit. GitHub.
- 40.
Andrews S. FastQC: a quality control tool for high throughput sequence data: Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
- 41. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8. pmid:27312411
- 42.
FASTX-Toolkit.
- 43. Putri GH, Anders S, Pyl PT, Pimanda JE, Zanini F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics. 2022;38(10):2943–5. pmid:35561197
- 44. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408. pmid:29846171
- 45. Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13(11):e1007081. pmid:29149188
- 46. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. pmid:25516281
- 47. Takash W, Cañizares J, Bonneaud N, Poulat F, Mattéi MG, Jay P, et al. SOX7 transcription factor: sequence, chromosomal localisation, expression, transactivation and interference with Wnt signalling. Nucleic Acids Res. 2001;29(21):4274–83. pmid:11691915
- 48. Oshimori N, Ohsugi M, Yamamoto T. The Plk1 target Kizuna stabilizes mitotic centrosomes to ensure spindle bipolarity. Nat Cell Biol. 2006;8(10):1095–101. pmid:16980960
- 49. Eaton JD, West S. An end in sight? Xrn2 and transcriptional termination by RNA polymerase II. Transcription. 2018;9(5):321–6. pmid:30035655
- 50. Johnson FB. PinX1 the tail on the chromosome. J Clin Invest. 2011;121(4):1242–4. pmid:21436580
- 51. Huang K, Wu Y, Shin J, Zheng Y, Siahpirani AF, Lin Y, et al. Transcriptome-wide transmission disequilibrium analysis identifies novel risk genes for autism spectrum disorder. PLoS Genet. 2021;17(2):e1009309. pmid:33539344
- 52. Alonso-Gonzalez A, Calaza M, Rodriguez-Fontenla C, Carracedo A. Novel gene-based analysis of ASD GWAS: insight into the biological role of associated genes. Front Genet. 2019;10.
- 53. Hannon E, Schendel D, Ladd-Acosta C, Grove J, iPSYCH-Broad ASD Group, Hansen CS, et al. Elevated polygenic burden for autism is associated with differential DNA methylation at birth. Genome Med. 2018;10(1):19. pmid:29587883
- 54. Pramparo T, Lombardo MV, Campbell K, Barnes CC, Marinero S, Solso S, et al. Cell cycle networks link gene expression dysregulation, mutation, and brain maldevelopment in autistic toddlers. Molecular Systems Biology. 2015;11(12):841.
- 55. Packer A. Neocortical neurogenesis and the etiology of autism spectrum disorder. Neurosci Biobehav Rev. 2016;64:185–95. pmid:26949225
- 56. Baranova A, Wang J, Cao H, Chen J-H, Chen J, Chen M, et al. Shared genetics between autism spectrum disorder and attention-deficit/hyperactivity disorder and their association with extraversion. Psychiatry Res. 2022;314:114679. pmid:35717853
- 57. West S, Gromak N, Proudfoot NJ. Human 5’ --> 3’ exonuclease Xrn2 promotes transcription termination at co-transcriptional cleavage sites. Nature. 2004;432(7016):522–5. pmid:15565158
- 58. Wang M, Pestov DG. 5’-end surveillance by Xrn2 acts as a shared mechanism for mammalian pre-rRNA maturation and decay. Nucleic Acids Res. 2011;39(5):1811–22. pmid:21036871
- 59. Nagarajan VK, Jones CI, Newbury SF, Green PJ. XRN 5’→3’ exoribonucleases: structure, mechanisms and functions. Biochim Biophys Acta. 2013;1829(6–7):590–603. pmid:23517755
- 60. Brannan K, Kim H, Erickson B, Glover-Cutter K, Kim S, Fong N, et al. mRNA decapping factors and the exonuclease Xrn2 function in widespread premature termination of RNA polymerase II transcription. Mol Cell. 2012;46(3):311–24.
- 61. Kinjo ER, Higa GSV, de Sousa E, Casado OAN, Damico MV, Britto LRG, et al. A possible new mechanism for the control of miRNA expression in neurons. Exp Neurol. 2013;248:546–58. pmid:23933240
- 62. Hicks SD, Middleton FA. A comparative review of microRNA expression patterns in autism spectrum disorder. Front Psychiatry. 2016;7.
- 63. Ghahramani Seno MM, Hu P, Gwadry FG, Pinto D, Marshall CR, Casallo G, et al. Gene and miRNA expression profiles in autism spectrum disorders. Brain Res. 2011;1380:85–97. pmid:20868653
- 64. Wu YE, Parikshak NN, Belgard TG, Geschwind DH. Genome-wide, integrative analysis implicates microRNA dysregulation in autism spectrum disorder. Nat Neurosci. 2016;19(11):1463–76. pmid:27571009
- 65. Abu-Elneel K, Liu T, Gazzaniga FS, Nishimura Y, Wall DP, Geschwind DH, et al. Heterogeneous dysregulation of microRNAs across the autism spectrum. Neurogenetics. 2008;9(3):153–61. pmid:18563458
- 66. Nussbacher JK, Tabet R, Yeo GW, Lagier-Tourenne C. Disruption of RNA Metabolism in Neurological Diseases and Emerging Therapeutic Interventions. Neuron. 2019;102(2):294–320. pmid:30998900
- 67. Marques AR, Santos JX, Martiniano H, Vilela J, Rasga C, Romão L, et al. Gene Variants Involved in Nonsense-Mediated mRNA Decay Suggest a Role in Autism Spectrum Disorder. Biomedicines. 2022;10(3):665. pmid:35327467
- 68. Pain O, Pocklington AJ, Holmans PA, Bray NJ, O’Brien HE, Hall LS, et al. Novel Insight Into the Etiology of Autism Spectrum Disorder Gained by Integrating Expression Data With Genome-wide Association Statistics. Biol Psychiatry. 2019;86(4):265–73. pmid:31230729
- 69. Peyre H, Schoeler T, Liu C, Williams CM, Hoertel N, Havdahl A, et al. Combining multivariate genomic approaches to elucidate the comorbidity between ASD and ADHD. bioRxiv. 2020.
- 70. Yang Z, Wu H, Lee PH, Tsetsos F, Davis LK, Yu D, et al. Investigating Shared Genetic Basis Across Tourette Syndrome and Comorbid Neurodevelopmental Disorders Along the Impulsivity-Compulsivity Spectrum. Biol Psychiatry. 2021;90(5):317–27. pmid:33714545
- 71. Katoh M. Expression of human SOX7 in normal tissues and tumors. Int J Mol Med. 2002;9(4):363–8. pmid:11891528
- 72. MacDonald BT, Tamai K, He X. Wnt/beta-catenin signaling: components, mechanisms, and diseases. Dev Cell. 2009;17(1):9–26. pmid:19619488
- 73. Kwan V, Unda BK, Singh KK. Wnt signaling networks in autism spectrum disorder and intellectual disability. J Neurodev Disord. 2016;8:45. pmid:27980692
- 74. de la Torre-Ubieta L, Won H, Stein JL, Geschwind DH. Advancing the understanding of autism disease mechanisms through genetics. Nat Med. 2016;22(4):345–61. pmid:27050589
- 75. Quesnel-Vallières M, Weatheritt RJ, Cordes SP, Blencowe BJ. Autism spectrum disorder: insights into convergent mechanisms from transcriptomics. Nat Rev Genet. 2019;20(1):51–63. pmid:30390048
- 76. Hormozdiari F, Penn O, Borenstein E, Eichler EE. The discovery of integrated gene networks for autism and related disorders. Genome Res. 2015;25(1):142–54. pmid:25378250
- 77. Vallée A, Vallée J-N, Lecarpentier Y. PPARγ agonists: potential treatment for autism spectrum disorder by inhibiting the canonical WNT/β-catenin pathway. Mol Psychiatry. 2019;24(5):643–52. pmid:30104725
- 78. El Khouri E, Ghoumid J, Haye D, Giuliano F, Drevillon L, Briand-Suleau A, et al. Wnt/β-catenin pathway and cell adhesion deregulation in CSDE1-related intellectual disability and autism spectrum disorders. Mol Psychiatry. 2021;26(7):3572–85. pmid:33867523
- 79. Caracci MO, Avila ME, Espinoza-Cavieres FA, López HR, Ugarte GD, De Ferrari GV. Wnt/β-Catenin-Dependent Transcription in Autism Spectrum Disorders. Front Mol Neurosci. 2021;14:764756. pmid:34858139
- 80. Francois M, Koopman P, Beltrame M. SoxF genes: Key players in the development of the cardio-vascular system. Int J Biochem Cell Biol. 2010;42(3):445–8. pmid:19733255
- 81. Kim K, Kim I-K, Yang J, Lee E, Koh B, Song S, et al. SoxF transcription factors are positive feedback regulators of VEGF signaling. Circ Res. 2016;119(7):839–52.
- 82. Klomp J, Hyun J, Klomp JE, Pajcini K, Rehman J, Malik AB. Comprehensive transcriptomic profiling reveals SOX7 as an early regulator of angiogenesis in hypoxic human endothelial cells. J Biol Chem. 2020;295(15):4796–808. pmid:32071080
- 83. Lilly AJ, Mazan A, Scott DA, Lacaud G, Kouskoff V. SOX7 expression is critically required in FLK1-expressing cells for vasculogenesis and angiogenesis during mouse embryonic development. Mech Dev. 2017;146:31–41. pmid:28577909
- 84. Wat M, Shchelochkov O, Holder A, Breman A, Dagli A, Bacino C, et al. Chromosome 8p23.1 deletions as a cause of complex congenital heart defects and diaphragmatic hernia. Am J Med Genet Part A. 2009;149A(8):1661–77.
- 85. Páez MT, Yamamoto T, Hayashi K, Yasuda T, Harada N, Matsumoto N, et al. Two patients with atypical interstitial deletions of 8p23.1: mapping of phenotypical traits. Am J Med Genet A. 2008;146A(9):1158–65. pmid:18393291
- 86. Stenman JM, Rajagopal J, Carroll TJ, Ishibashi M, McMahon J, McMahon AP. Canonical Wnt signaling regulates organ-specific assembly and differentiation of CNS vasculature. Science. 2008;322(5905):1247–50. pmid:19023080
- 87. Reis M, Liebner S. Wnt signaling in the vasculature. Experimental Cell Research. 2013;319(9):1317–23.
- 88. Yao Y, Walsh WJ, McGinnis WR, Praticò D. Altered vascular phenotype in autism: correlation with oxidative stress. Arch Neurol. 2006;63(8):1161–4. pmid:16908745
- 89. Ouellette J, Toussay X, Comin CH, Costa L da F, Ho M, Lacalle-Aurioles M, et al. Vascular contributions to 16p11.2 deletion autism syndrome modeled in mice. Nat Neurosci. 2020;23(9):1090–101. pmid:32661394
- 90. Emanuele E, Orsi P, Barale F, di Nemi SU, Bertona M, Politi P. Serum levels of vascular endothelial growth factor and its receptors in patients with severe autism. Clin Biochem. 2010;43(3):317–9. pmid:19850021
- 91. Casanova MF. The neuropathology of autism. Brain Pathol. 2007;17(4):422–33.
- 92. Gozal E, Jagadapillai R, Cai J, Barnes GN. Potential crosstalk between sonic hedgehog-WNT signaling and neurovascular molecules: Implications for blood-brain barrier integrity in autism spectrum disorder. J Neurochem. 2021;159(1):15–28. pmid:34169527
- 93. Chauhan A, Chauhan V. Oxidative stress in autism. Pathophysiology. 2006;13(3):171–81. pmid:16766163
- 94. Bjørklund G, Meguid N, El-Bana M, Tinkov A, Saad K, Dadar M, et al. Oxidative stress in autism spectrum disorder. Molec Neurobiol. 2020;57(5):2314–32.
- 95. Zhang Y, Sun Y, Wang F, Wang Z, Peng Y, Li R. Downregulating the canonical Wnt/β-catenin signaling pathway attenuates the susceptibility to autism-like phenotypes by decreasing oxidative stress. Neurochem Res. 2012;37(7):1409–19. pmid:22374471
- 96. Weber A, Köhler A, Hahn A, Müller U. 8p23.1 duplication syndrome: narrowing of critical interval to 1.80 Mbp. Mol Cytogenet. 2014;7(1):94.
- 97. Aspra Q, Cabrera-Mendoza B, Morales-Marín ME, Márquez C, Chicalote C, Ballesteros A, et al. Epigenome-Wide Analysis Reveals DNA Methylation Alteration in ZFP57 and Its Target RASGFR2 in a Mexican Population Cohort with Autism. Children. 2022;9(4):462.
- 98. Lewis J, Bird A. DNA methylation and chromatin structure. FEBS Lett. 1991;285(2):155–9. pmid:1855583
- 99. Keshet I, Lieman-Hurwitz J, Cedar H. DNA methylation affects the formation of active chromatin. Cell. 1986;44(4):535–43. pmid:3456276
- 100. Buitrago D, Labrador M, Arcon JP, Lema R, Flores O, Esteve-Codina A, et al. Impact of DNA methylation on 3D genome structure. Nat Commun. 2021;12(1):3243. pmid:34050148
- 101. Zhao T, Yang H, Tian Y, Xie Q, Lu Y, Wang Y, et al. SOX7 is associated with the suppression of human glioma by HMG-box dependent regulation of Wnt/β-catenin signaling. Cancer Lett. 2016;375(1):100–7. pmid:26944317
- 102. Kim I-K, Kim K, Lee E, Oh DS, Park CS, Park S, et al. Sox7 promotes high-grade glioma by increasing VEGFR2-mediated vascular abnormality. J Exp Med. 2018;215(3):963–83. pmid:29444818
- 103. Crawley JN, Heyer WD, LaSalle JM. Autism and cancer share risk genes, pathways, and drug targets. Trends Genet. 2016;32(3):139–46.
- 104. Gabrielli AP, Manzardo AM, Butler MG. GeneAnalytics Pathways and Profiling of Shared Autism and Cancer Genes. Int J Mol Sci. 2019;20(5):1166. pmid:30866437
- 105. Crespi B. Autism and cancer risk. Autism Res. 2011;4(4):302–10. pmid:21823244
- 106. Tabarés-Seisdedos R, Rubenstein JLR. Chromosome 8p as a potential hub for developmental neuropsychiatric disorders: implications for schizophrenia, autism and cancer. Mol Psychiatry. 2009;14(6):563–89. pmid:19204725
- 107. Becker EB, Stoodley CJ. Autism spectrum disorder and the cerebellum. Int Rev Neurobiol. 2013;113:1–34.
- 108. Su L-D, Xu F-X, Wang X-T, Cai X-Y, Shen Y. Cerebellar Dysfunction, Cerebro-cerebellar Connectivity and Autism Spectrum Disorders. Neuroscience. 2021;462:320–7. pmid:32450293
- 109. Hampson DR, Blatt GJ. Autism spectrum disorders and neuropathology of the cerebellum. Front Neurosci. 2015;9:420. pmid:26594141
- 110. Fatemi SH, Aldinger KA, Ashwood P, Bauman ML, Blaha CD, Blatt GJ, et al. Consensus paper: pathological role of the cerebellum in autism. Cerebellum. 2012;11(3):777–807. pmid:22370873
- 111. Schmahmann JD. The cerebellum and cognition. Neurosci Lett. 2019;688:62–75. pmid:29997061
- 112. Schmahmann J, Guell X, Stoodley C, Halko M. The theory and neuroscience of cerebellar cognition. Annu Rev Neurosci. 2019;42:337–64.
- 113. Van Overwalle F, Manto M, Cattaneo Z, Clausi S, Ferrari C, Gabrieli JDE, et al. Consensus paper: cerebellum and social cognition. Cerebellum. 2020;19(6):833–68.
- 114. D’Mello AM, Crocetti D, Mostofsky SH, Stoodley CJ. Cerebellar gray matter and lobular volumes correlate with core autism symptoms. Neuroimage Clin. 2015;7:631–9. pmid:25844317
- 115. McKinney WS, Kelly SE, Unruh KE, Shafer RL, Sweeney JA, Styner M, et al. Cerebellar Volumes and Sensorimotor Behavior in Autism Spectrum Disorder. Front Integr Neurosci. 2022;16:821109. pmid:35592866
- 116. Webb SJ, Sparks B-F, Friedman SD, Shaw DWW, Giedd J, Dawson G, et al. Cerebellar vermal volumes and behavioral correlates in children with autism spectrum disorder. Psychiatry Res. 2009;172(1):61–7. pmid:19243924
- 117. Wang Y, Xu Q, Zuo C, Zhao L, Hao L. Longitudinal Changes of Cerebellar Thickness in Autism Spectrum Disorder. Neurosci Lett. 2020;728:134949. pmid:32278028
- 118. Wang C, Qin L, Min Z, Zhao Y, Zhu L, Zhu J, et al. SOX7 interferes with β-catenin activity to promote neuronal apoptosis. Eur J Neurosci. 2015;41(11):1430–7. pmid:25847511
- 119. Kumar H, Sharma B. Memantine ameliorates autistic behavior, biochemistry & blood brain barrier impairments in rats. Brain Res Bull. 2016;124:27–39. pmid:27034117
- 120. Wang Y, Yu S, Li M. Neurovascular crosstalk and cerebrovascular alterations: an underestimated therapeutic target in autism spectrum disorders. Front Cell Neurosci. 2023;17:1226580. pmid:37692552
- 121. Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, et al. Cell type-specific gene expression differences in complex tissues. Nat Methods. 2010;7(4):287–9. pmid:20208531
- 122. Lawlor N, George J, Bolisetty M, Kursawe R, Sun L, Sivakamasundari V, et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 2017;27(2):208–22. pmid:27864352
- 123. Weyer A, Schilling K. Developmental and cell type-specific expression of the neuronal marker NeuN in the murine cerebellum. J Neurosci Res. 2003;73(3):400–9. pmid:12868073
- 124. Xu X, Wells AB, O’Brien DR, Nehorai A, Dougherty JD. Cell type-specific expression analysis to identify putative cellular mechanisms for neurogenetic disorders. J Neurosci. 2014;34(4):1420–31. pmid:24453331
- 125. Fitzgerald JB, Jin M, Dean D, Wood DJ, Zheng MH, Grodzinsky AJ. Mechanical compression of cartilage explants induces multiple time-dependent gene expression patterns and involves intracellular calcium and cyclic AMP. J Biol Chem. 2004;279(19):19502–11. pmid:14960571
- 126. Hsieh AH, Tsai CM, Ma QJ, Lin T, Banes AJ, Villarreal FJ, et al. Time-dependent increases in type-III collagen gene expression in medical collateral ligament fibroblasts under cyclic strains. J Orthop Res. 2000;18(2):220–7. pmid:10815822