Co-expression Profiling of Autism Genes in the Mouse Brain

Autism spectrum disorder (ASD) is one of the most prevalent and highly heritable neurodevelopmental disorders in humans. There is significant evidence that the onset and severity of ASD is governed in part by complex genetic mechanisms affecting the normal development of the brain. To date, a number of genes have been associated with ASD. However, the temporal and spatial co-expression of these genes in the brain remain unclear. To address this issue, we examined the co-expression network of 26 autism genes from AutDB (http://mindspec.org/autdb.html), in the framework of 3,041 genes whose expression energies have the highest correlation between the coronal and sagittal images from the Allen Mouse Brain Atlas database (http://mouse.brain-map.org). These data were derived from in situ hybridization experiments conducted on male, 56-day old C57BL/6J mice co-registered to the Allen Reference Atlas, and were used to generate a normalized co-expression matrix indicating the cosine similarity between expression vectors of genes in this database. The network formed by the autism-associated genes showed a higher degree of co-expression connectivity than seen for the other genes in this dataset (Kolmogorov–Smirnov P = 5×10−28). Using Monte Carlo simulations, we identified two cliques of co-expressed genes that were significantly enriched with autism genes (A Bonferroni corrected P<0.05). Genes in both these cliques were significantly over-expressed in the cerebellar cortex (P = 1×10−5) suggesting possible implication of this brain region in autism. In conclusion, our study provides a detailed profiling of co-expression patterns of autism genes in the mouse brain, and suggests specific brain regions and new candidate genes that could be involved in autism etiology.


Introduction
Autism spectrum disorder (ASD) is one of the most prevalent and highly heritable neurodevelopmental disorders in humans [1][2][3]. There is strong evidence that the onset and severity of ASD is governed in part by complex molecular mechanisms affecting the normal development of the brain [1,4]. While no major anatomical pathology have been observed in brains of ASD cases [5], various molecular and neuroimaging studies have linked several brain regions to ASD. For example, Voineagu et al. have found differences in gene expression patterns in the cortex of ASD brain [6]. Cortical regions has also been highlighted in neuroimaging studies of autistic brains along the cerebellum and other brain areas [7,8]. In addition, other studies have pointed to various molecular mechanisms that might be altered in the autistic brain [9][10][11]. In this realm, genes involved in synapse formation and brain circuitry are consistently found to be dysregulated in people with ASD [12][13][14].
Recent genomic advances have led to the discovery of diverse genetic loci linked to ASD, including chromosomal aberrations [15,16], copy number variations [11,17,18] and both common and rare single nucleotide variations (SNVs) [19][20][21][22][23][24]. Conse-quently, to date, more than 330 candidate genes have been associated with ASD susceptibility [25] and many more are projected to be found. However, despite the plethora of genetic variations associated with ASD, the molecular mechanisms and neuroanatomical structures underlying ASD traits remain largely unclear. The mouse model system provides a convenient and safe approach to experimentally study neuroanatomical mechanisms and candidate genes for autism susceptibility [26][27][28]. At present, dozens of single gene knockout and transgenic mice models have been used to elucidate neuropathology that might underlie the autism-like behaviors [29]. Despite the obvious genetic and neuroanatomical differences between mouse and human, mouse models are extremely valuable and effectively used in dissecting out the role of specific gene, pathway, neuron subtype, or brain region in a particular abnormal behavior shared by both these mammals. In this realm, the Allen Brain Atlas of the mouse [30][31][32][33] provides a comprehensive source of genome-wide highresolution atlas of gene expression throughout the adult mouse brain. In this study, we utilized this database to examine the spatial co-expression characteristics of genes associated with autism susceptibility. Consequently, we identified several co-expression gene networks that are enriched with autism genes highlighting potential candidate genes and brain regions implicated in autism.

Construction of the autism genes dataset (AutRef84)
The autism gene database, AutDB [34], was used to construct a set of 84 genes, AutRef84, strongly implicated in autism pathogenesis [25,29]. Genes contained in AutDB were classified into four genetic categories: (1) Rare: rare submicroscopic copy number variants and single gene disruptions or mutations directly linked to ASD; (2) Syndromic: genes implicated in syndromes in which a subset of affected individuals also develop autistic symptoms; (3) Association: small risk-conferring candidate genes with common polymorphisms that have been identified from genome-wide association studies in idiopathic ASD cases; and (4) Functional: functional candidate genes relevant to ASD that are not covered by any of the other genetic categories. AutRef84 was generated by filtering out functional candidate genes that lack any experimentally derived genetic link to ASD as well as genes that solely belong to the Association classification. The resulting dataset consisted of 64 genes classified within the rare classification and 20 within the syndromic classification (Supplementary Table S1).

Gene expression dataset
We utilized gene expression data available from the Allen Brain Atlas (ABA) of the mouse brain (http://mouse.brain-map.org) which contains voxelized expression profiles for ,20,000 genes derived from in situ hybridization (ISH) experiments conducted on male, 56-day old C57BL/6J mice) [30][31][32][33]. Gene expression profiles in the ABA were generated for the mouse brains from processed image data for the sagittal sections of a single hemisphere and, for 4,104 genes of high neurobiological interest, coronal and sagittal sections across the whole brain. We focused on genes for which brain-wide data are available. The ISH data were co-registered to the Allen Reference Atlas, which is partitioned into 49,742 cubic voxels of size 200 microns. A gene labeled g has a profile of expression energy presented as a function E(v,g) over the brain, where v is a voxel label. The Allen Mouse Brain Atlas Addiction Database (http://addiction. brainarchitecture.org/) used this expression dataset to create a high quality brain expression dataset, called A best , which consists of 3,041 genes with the most highly correlated expression energy profiles between the coronal and sagittal sections [35]. This geneexpression dataset was used for the co-expression analysis discussed below.

Gene co-expression analysis
For the set of 3,041 genes defined above, a gene-by-gene coexpression matrix was computed as follows: where V = 49,742 is the total number of voxels in the brain. The quantity C(g,g') is the cosine similarity between the gene-expression profiles of genes g and g' with values ranging between zero and one by construction. The motivation for choosing the cosine similarity as the co-expression measure in this study, is its connection to the difference between the energies of two brainwide functions. The proposed co-expression measure (Equation 1) can also be described as a simple decreasing function of the squared energy of the difference between two brain-wide functions. The co-expression matrix C is symmetric, with ones on the diagonal. Its size is 3,041, the number of genes in the full A best dataset. Given a subset of this dataset, the co-expression matrix of this subset can be obtained by extracting the sub-matrix of C corresponding to the position of the genes in the full dataset.
To determine whether the autism genes are more co-expressed than other genes expected by chance, the cumulative distribution function (CDF) of the entries of the autism-related and A best coexpression matrices were compared, and a two-sample Kolmogorov-Smirnov test was used to determine whether they were drawn from the same probability distribution. In addition, we compared the connectivity of the co-expression network of the autism genes to co-expression networks of the same size randomly drawn from A best . For that, Monte Carlo simulations [36] were conducted to generate 100,000 random gene sets sampled from A best , and the co-expression matrices of these sets of genes where extracted from the co-expression matrix C.
Given the co-expression matrix of G genes of interest, one can consider the underlying weighted graph with nodes corresponding to genes, and the weight of links equals to the co-expression of the nodes [37]. The matrix can be cut at any value r between zero and one, resulting in links with weights lower than the threshold r. At any value of the threshold, the connected components (sets of connected genes) can be computed using Tarjan's algorithm. In particular, the maximum size M(r) and average size A(r) of connected components can be calculated for all gene sets over different co-expression values. If N r (k) is the number of connected components in the co-expression matrix of G genes defined by r threshold and contain exactly k genes out of G genes in the gene set, the maximum and average sizes are expressed as follows: M(r)~maxfk[½1::G,N r (k)w0g ð 3Þ

Author Summary
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition associated with many different genes. However, the neuroanatomical and functional properties of these genes in the brain are largely unknown. Here we examined the co-expression network of 26 genes associated with ASD, using data from the Allen Mouse Brain Atlas, which provides a whole-genome, high-resolution map of gene expression pattern in the adult mouse brain. We discovered that autism genes are significantly more coexpressed than expected by chance, suggesting common neuro-functional properties. We then examined the spatial properties of co-expression modules that are highly enriched with autism genes. Consequently, we found that genes in two of these modules are significantly overexpressed in the cerebellar cortex, and particularly in sections that are predominantly populated by granular cells. These findings provide the essential link between gene networks associated with ASD and specific brain regions, and hence lay out a basis for further exploration of the particular neuronal circuits involved in ASD etiology. Further, we applied the classical definition of ''clique'' from graph theory [38,39] to our co-expression matrix to characterize networks of co-expressed genes such that every gene in the network is connected to all other genes at a co-expression threshold r.
Anatomical and functional characterization of coexpression cliques Next, we aimed at evaluating the unique anatomical properties of co-expression cliques that are significantly enriched with autism genes. For that, we first identified virtually all cliques in the dataset that contained ng$2 autism genes. Then, we used Monte Carlo simulations using 100,000 randomly generated gene sets of size G = 26 to compute the likelihood of each clique to contain at least ng autism genes. Finally, for cliques that were significantly enriched with autism genes, we calculated the sum of the normalized expression profiles of the genes in the clique as follow: We then used S clique , to examine the neuroanatomical properties of the genes included in the clique as follows [40]. For a given brain region v in the Allen Reference Atlas [41], the fitting score between the expression profile and the region v is defined as the cosine similarity between the expression profile and the characteristic function x v of the brain region. Figure 2. Cliques of co-expression genes delineated by co-expressed autism genes. The 59 cliques of co-expression genes containing $2 autism genes identified in our data are listed in the rows together with the minimal co-expression value in the clique, and numbers of autism genes and number of total genes in the clique. A p-value indicates the likelihood of finding this number of autism genes in the clique (based on 100,000 Monte Carlo simulations). The 26 autism genes included in the study are depicted in columns with black and white fillings indicating their presence and absence from a clique respectively. doi:10.1371/journal.pcbi.1003128.g002 Figure 3. Cliques of co-expressed genes. Venn diagram for the ten cliques of co-expressed genes that are highly enriched with autism genes. For each clique, the following parameters are given: the number of autism genes/number of all genes in the clique, the co-expression threshold (in %) of the clique, and the p-value of the autism gene enrichment (using 100,000 Monte Carlo resampling procedure). doi:10.1371/journal.pcbi.1003128.g003 where x v (v) equals one if voxel labeled v belongs to region v, and zero otherwise. This fitting score would equal one if the sum of the gene expression in the clique were proportional to the characteristic function of the region v, and zero if it were entirely supported outside the region. The fitting score defined in the above equation equals the co-expression between a (hypothetical) gene whose expression profile would be S clique , and a (hypothetical) gene whose expression would coincide exactly with region v. Fitting scores of a given clique can be computed in all brain regions, and the distribution of these fitting scores can be simulated by repeatedly drawing random cliques of genes (with the same number of genes) from A best . These analyses were performed using a commercial software package (MATLAB R2011b, The MathWorks Inc., Natick, MA, 2000).
Finally, we used the Bioconductor GOstats package in R software [42] to assess whether genes belonging to a co-expression clique, also share other functional or molecular properties. The absolute list of GO terms were obtained using both a (a) cutoff = 2*ratio (fg/fc) [where fg = frequency of occurrence of a GO term in the given gene set, fc = frequency of its occurrence in the complete list of human genes] and (b) cut-off = median value of ratio (fg/fc). Only significant terms (P,0.01) with an associated gene count. = 5 were considered.

Results
Overall, 26 genes were found in the intersection of the autismgenes dataset and the dataset of high-quality expression genes from the Allen Brain Atlas (ABA) of the mouse brain (AutRef8 4>A best = 26). These autism-related genes showed a higher degree of co-expression connectivity than all other genes in this dataset (Kolmogorov-Smirnov P = 5610 228 ). Comparing the empirical distributions of co-expression values of the autism genes to the other genes in the Allen dataset revealed that the largest deviation between these distributions was at co-expression value of 47.53% ( Figure 1A). Furthermore, we evaluated the connectivity of genes across different co-expression values. Here too, the average size of connected components (See methods) among autism gene was consistently larger than seen in 1000 randomly generated gene sets from the Allen database ( Figure 1B).
Next, we examined cliques (see methods) of co-expressed genes delineated by autism genes inter-connected at co-expression values of $47.53%. A total of 59 overlapping cliques were characterized containing on average 563.5 genes and 8.3 autism genes ( Figure 2). Finally, using Monte Carlo simulations, we identified ten cliques that were significantly enriched with autism genes at P,0.01 (Figure 3). Of note, the two top ranked cliques remained significant (P,0.05) even after accounting for multiple testing using the conservative Bonferroni correction.
The top-ranked clique in our analysis (hereafter will be referred as Clique I), was delineated by the autism genes: Ptchd1, Galnt13, Dpp6 and Astn2, and included another 29 genes, inter-connected with a co-expression values of $70% (Supplementary Table  S2). The second top-ranked clique (hereafter will be referred as Clique II) included the autism genes: Rims3, and Astn2, and another four genes, all inter-connected at $79% co-expression level (Supplementary Table S3). Examining the neuroanatomical expression properties across a set of 134 brain regions of the left hemisphere (41 of which are cortical, and 93 subcortical) grouped by the 12 main brain regions according to the Allen Reference Atlas, revealed a significant over-expression of genes belonging to Cliques I in the cerebellar cortex ( Figures 4A, 4B,  Supplementary Figure S1). Genes belonging to Clique II also showed a slight over-expression in the cerebellar cortex ( Figures 5A, 5B), as well as in several cortical regions, however these signals were much weaker than the one of clique I. Next, we asked if the over-expression in the cerebellum is a unique property of these two cliques. For that, we examined the neuroanatomical expression of all cliques in Figure 2 and found only seven other cliques showing a similar over-expression in the cerebellar cortex. Interestingly, these cliques had substantial gene overlap with cliques I&II, and were ranked high in their autism gene enrichment scores (Supplementary Table S4), thus supporting the illumination of the cerebellar cortex in this analysis.
Finally, we asked if these two co-expressed cliques are associated with particular cell type or any other functional or cellular property. Examining cell-type-specific microarray data which we have for 64 cell types [43] revealed that 4 of them (stellate basket cells, granule cells, oligodendrocytes and Purkinje cells) are considerably populating the cerebellum (Supplementary Figure S2). Further, we looked through different coronal sections through the cerebellum from the Allen Reference Atlas [41] and visually compared them to sections of the sum of expressions of genes in Cliques I & II. Figure 6 shows the normalized volumetric expression quantities of both cliques along with the closest coronal section of the mouse brain in the Allen Reference Atlas. One can see that the voxels with the most intense expression in both cliques tend to follow the granular layer. Hence, the results of these analyses suggest that genes in both clique I & II tend to be over-expressed in granule cells in the cerebellar cortex. Using the Bioconductor GOstats package in R, we two biological processes: ''Transmission of nerve impulse'' (P = 0.001842), and ''Ion transport'' (P = 0.000733) and one cellular component ''Vesicles'' (P = 0.001134) that were enriched with genes from Clique I ( Supplementary Table S5). Unfortunately, the number of genes in Clique II was too small for this analysis.

Discussion
In this study, we explored the co-expression network 26 autism genes within the framework of 3,041 genes exhibiting the highest-quality expression data in the Allen Mouse Brain Atlas database [30][31][32][33]. The significantly tighter co-expression connectivity among the 26 autism genes than other genes, implies common functional properties for these genes in the mouse brain. Further investigation into the co-expression patterns of these genes revealed two cliques of co-expressed genes that were significantly dominated by autism genes. Genes in both these cliques shown significant over-expression in the cerebellar cortex, and particularly in sections that are predominantly populated by granular cells. Some regions of the cerebral cortex are also highlighted by the second clique ( Figure 5), but to a lesser extent than the cerebellar cortex. Another recent study of our group examining the expression of the same autism gene set (AutRef84) in different human tissues, found a statistically significant enrichment in the frontal cortex [44]. The cerebral cortex was highlighted in other neuroanatomical studies of autism in both human [45,46] and mouse [47] and is known to play a central role in cognitive and emotional processing [48], which are key deficits in autism and other neuropsychiatric disorders. In addition, a recent neuroimaging study [49] highlighted functional sub-regions in the cerebellum as playing a role in both motor and cognitive tasks. Other genes associated with autism have been shown to play a role in the development of this region [50][51][52][53]. Our results, provide additional support in the potential involvement of the cerebellum in The corresponding coronal section of the mouse brain in the Allen Mouse Brain Atlas, Allen Institute for Brain Science [57]. doi:10.1371/journal.pcbi.1003128.g006 autism etiology, and suggest additional candidate genes that are also over-expressed in the cerebellar cortex.
Two recent transcriptomic analyses in human brains [6,54] revealed additional co-expression modules enriched with autism-associated genes. Some of these modules partially overlap with our findings in either gene content or brain regions, suggesting common functional and neuroanatomical properties of autism gene in both human and mouse brains. Together, these studies provide new insights into the specific gene networks and brain regions that could be involved in autism etiology.
A major strength of our study is the utilization of the Allen Mouse Brain Atlas [30][31][32][33] which comprises a high-resolution genome-wide exploration of gene expression in the adult mouse brain. This data allows one to explore gene expression properties up to a resolution of 200 microns, which provide a good distinction between different brain regions as well as potentially tell apart different sub-regions and cell types. Another advantage of this study is the focus on those genes exhibiting the highest expression correlation between the coronal and sagittal sections [35] as well as restricting the autism gene to a subset demonstrating, to the best of our knowledge, the most compelling associations to autism susceptibility. These strict criteria reduce the chances of erroneous results. Our study has also some pitfalls. First, the analyses were done on data from mouse brains. Since autism is a human condition, one may ask how well finding of this study apply to human brain. A recent study comparing postsynaptic protein composition between mouse and human suggest a high correlation between these two mammals in those matters [55]. Nevertheless, similar analyses in the human brain are still required to provide a finer validity to our findings. In addition, the strict criteria used here, restricted the number of studied genes to 3, 041 and 26 autism genes which are roughly represent 15% and 31% of the genes in the Allen Brain Atlas and AutDB datasets respectively. Such a small number of genes might results in false negatives and hence might miss other co-expression properties and brain regions associated with autism. Hence, larger studies are needed to complement the results of our analysis.
In conclusions, our study provides unique insights into the neuroanatomical co-expression properties of genes associated with autism in the mouse brain and suggest specific regions implicated in autism etiology. Complementing these findings with additional genomics and neuroimaging analyses from both mouse and human brains would help gaining a broader picture of the autistic brain. Oligodendrocytes. These data correspond to microarray data from [56], using the data sets estimated in [43] to have the highest purity. Interestingly, the estimated brain-wide densities are almost zero in the cerebral cortex, suggesting that cell types characterized by their transcriptomes are indeed specific to the cerebellar cortex. (TIF)