Ancient Haplotypes at the 15q24.2 Microdeletion Region Are Linked to Brain Expression of MAN2C1 and Children's Intelligence

The chromosome bands 15q24.1-15q24.3 contain a complex region with numerous segmental duplications that predispose to regional microduplications and microdeletions, both of which have been linked to intellectual disability, speech delay and autistic features. The region may also harbour common inversion polymorphisms whose functional and phenotypic manifestations are unknown. Using single nucleotide polymorphism (SNP) data, we detected four large contiguous haplotype-genotypes at 15q24 with Mendelian inheritance in 2,562 trios, African origin, high population stratification and reduced recombination rates. Although the haplotype-genotypes have been most likely generated by decreased or absent recombination among them, we could not confirm that they were the product of inversion polymorphisms in the region. One of the blocks was composed of three haplotype-genotypes (N1a, N1b and N2), which significantly correlated with intelligence quotient (IQ) in 2,735 children of European ancestry from three independent population cohorts. Homozygosity for N2 was associated with lower verbal IQ (2.4-point loss, p-value = 0.01), while homozygosity for N1b was associated with 3.2-point loss in non-verbal IQ (p-value = 0.0006). The three alleles strongly correlated with expression levels of MAN2C1 and SNUPN in blood and brain. Homozygosity for N2 correlated with over-expression of MAN2C1 over many brain areas but the occipital cortex where N1b homozygous highly under-expressed. Our population-based analyses suggest that MAN2C1 may contribute to the verbal difficulties observed in microduplications and to the intellectual disability of microdeletion syndromes, whose characteristic dosage increment and removal may affect different brain areas.


Introduction
The chromosome bands 15q24.1-15q24.3 harbour a complex genomic region with multiple large blocks of segmental duplications (A through E) that mediate recurrent rearrangements, including inversions, deletions and duplications of variable size and extent [1][2][3]. Both, microdeletions and microduplications of this region cause unusual facial morphology along with intellectual disability, speech delay and autistic features [4][5][6][7]. Most reported deletions associated with phenotype include the 1.1 Mb critical region located between blocks B and C and also the 0.6 Mb C-D region where smaller deletions have been found in at least two patients with borderline intellectual disability [4]. Thus, while the severe core cognitive deficits of the 15q24 microdeletion syndrome are thought to be due to deletion of genes between B and C, some of the genes located between blocks C and D must also be important for normal development and behaviour.
The finding of inversion polymorphisms in the region suggests suppression of recombination and possible support for extended haplotypes [2]. It has been previously shown that the imprints of inversions can result in extended regions with high linkage disequilibrium (LD) and points where there is higher linkage at distant points than at the immediate neighbourhood [8][9][10]. The convergence of these two features has been used to infer inversion status in other inversions [11]. Here we first investigated the haplotype structure of 15q24 and its potential link to previously reported inversions. We have found four contiguous extended haplotype blocks, detectable by differences in linkage disequilibrium (LD) between SNP blocks and haplotype divergence clades [11][12]. Given the reported implication of gene dosage effects of this region in autism and cognitive deficits, we aimed to investigate the evolutionary history, the effect on gene expression and putative influence of the haplotype-genotypes on autism susceptibility and on intelligence quotient (IQ) of children and adolescents recruited from the general population. These population-based analyses were used to determine the genes that might also contribute to the cognitive impairments of the microdeletion and microduplication syndromes.

Characterization of haplotypes blocks at 15q24.2
We used two recent bioinformatics methods, inveRsion and invClust within 15q24.1-15q24.3 (Fig 1I), to detect linkage disequilibrium differences and extended haplotype-genotype structures that could be caused by suppressed recombination and even linked with inversion polymorphisms, see more details in the Methods section. Scanning with inveRsion SNP data, of 505 European subjects from the 1000 genomes project, the region between 72-79 Mb of chromosome 15 (hg19), we detected four positive signals (BIC>0) of LD differences (Fig 1II). The first signal (M) is within the previously reported 1.1Mb inversion between segmental duplications B and C and extends between: 74.71-75.12Mb (hg19) [1][2]. The second block (N) with significant LD differences corresponded to a 0.4 Mb interval between 75.5-75.9Mb, within the segmental duplications C and D. The other two signals (O and P) were obtained between 76.46-77.31Mb and 77.3-77.87Mb, within duplications D-E.
We then used the multivariate method, invClust, to determine the haplotype-genotypes of the M, N, O and P regions ( Fig 1III). We clustered the first two MDS components into genotypes where the reference genome was mapped to the non-variant alleles M1, N1, O1 and P1, and we denoted the variant haplotypes M2, N2, O2 and P2. We found very low LD between all haplotype-genotypes (r 2 <0.06). We determined the extent of the haplotypes-genotypes by computing the LD (r 2 ) between the haplotype-genotypes and the SNPs within the 15q24 region ( Fig 1IV). To confirm the allelic structure of the haplotypes we tested their Mendelian inheritance in 2,259 European and 303 non-European trios from the Autistic Genome Project (AGP). We observed a very low rate in transmission error 0.4% for M1/M2, 0.6% for N1/N2, 1.1% for O1/O2 and 1.9% for P1/P2.
We analysed the recombination rate from HapMap II (version 2011-01, http://hapmap. ncbi.nlm.nih.gov/downloads/recombination/) data of the entire 3.15 Mb region comprising the four haplotype blocks (74.71-77.87Mb, hg19). We randomly selected 10,000 segments of similar size over chromosome 15 and computed their mean recombination rate. We then compared the mean recombination rate of the M-P block (0.063 cM/Mb) with the distribution of values obtained in the random selection and found that this segment has a significant reduction in recombination rate (p-value = 0.01); see Fig 2. We further observed that within the N block subjects could be classified into six clusters consistent with the haplotype-genotypes of two other possible clades, in addition to N2 ( Fig  3A). In the MDS analysis within the C-D block, we found that each cluster could represent six possible haplotype-genotypes of three different haplotype groups, namely N2, N1a and N1b. The three extreme clusters would represent the homozygous individuals (N1a/N1a: cluster 1, N1b/N1b: cluster 3, N2/N2: cluster 6) and the clusters in between two homozygous groups would contain the corresponding heterozygous individuals (N1a/N1b: cluster 2, N1a/N2: cluster 4, N1b/N2: cluster 5). We confirmed the allelic structure of the haplotypes as there were no Mendelian errors in the 60 HapMap CEU and YRI trios and very low errors in the 2,259 AGP trios for the three predicted haplotypes: 0.4% for Europeans and 1.9% for non-Europeans (S1 File). Thus, the haplotype-genotyping was highly accurate for inferences in large population datasets.
We computed LD r 2 between the SNPs in the 15q24.2 region and the haplotype status (N1a, N1b, N2) of CEU individuals in the 1000 Genome data. We found over 250 SNPs that could be used to tag a given haplotype with r 2 >0.9; see S2 File were the LD for YRI and CHB-JPT are also reported. Phylogeny analysis for N1a, N1b and N2 homozygous, on the 26 populations of Long haplotype structures in 15q24. I) Genomic location of the region between 15q24.1-15q24.3 prone to microdeletion and micro duplication syndromes. Segmental duplication blocks A to E are indicated in yellow. Red blocks show the microdeletion cases, adapted from Mefford et al. [4]. The red block marked with MD is a case with the minimum deletion segment (MD in the figure). Also, the green block illustrates the inversion discussed by Antonacci et al. [2] coinciding with the smallest region of overlap (SRO) identified by Magoulas and El-Hattab [3]. II) inveRsion scan over the region. Clear signals of LD differences between SNP blocks are detectable and marked with M, N, O, P. III) Haplotype-genotype clustering of Multidimensional (MDS) Analysis by invClust within segments M, N, O and P. IV) Blocks of LD (r 2 ) between SNPs in the regions M, N, O and P and the haplotype-genotype calls made with invClust. the 1000 Genomes, showed clades that follow each haplotype group rather than population membership, suggesting greater genetic variability between the haplotype groups than between ancestral populations (Fig 3B). We observed that the chimp sequence mapped to N1b; the less frequent of the three haplotypes in Europeans: N2: 59%, N1a: 24%, N1b: 17%.

Population frequency of haplotype blocks at 15q24
We analysed all 26 populations from the 1000 Genomes project to determine the global frequency of all four M2, N2, O2 and P2 haplotype groups. We observed that all haplotype-  15q24.2 Haplotypes, Gene Expression and Brain Development genotypes were in Hardy Weinberg Equilibrium (S1 Table), with the only exception of P2 for the KHV and BEB populations (pvalue < 0.01). We found that the four frequencies for M2, N2, O2 and P2 are highly stratified by population origin and follow a clinal distribution consistent with an out of Africa expansion (Fig 4). We tested whether the distance from Ethiopia could explain more the differences between the population frequencies than genetic drift alone. For each haplotype block, random samples of 10,000 thousand SNPs across the genome were drawn with similar mean frequencies in Africa. For each SNP, we computed the explained variance given by the R 2 value obtained from the regression model between the population frequency and the distance from Ethiopia. The R 2 for the haplotypes M2, N2, O2 and P2 was compared with the null distributions obtained from the sampling. We thus observed that the distance from Ethiopia explains more the differences between population frequencies of O2 (pvalue = 0.02) than what would be expected from genetic drift alone (S1 Fig). The N2 allele had a tendency to significance (p-value = 0.06). We also observed that two SNPs tagging N2 and O2, rs4462560 and rs9635320, respectively, showed significant signals of selection by iHS measures (|iHS|>2) in YRI (iHS in rs4462560 = 3.008 and iHS in rs9635320 = -3.822) as reported in the haplotter database (http://haplotter.uchicago.edu/instruction.html) [13]. We could not eliminate genetic drift as a potential driver of heterozygosities of M2 and P2 haplotypes nor in the F ST values in any of the four haplotypes, which nonetheless increased with distance from Ethiopia, as expected (S2 Fig).

Haplotypes and structural variants
We compared the entire~7 Mb human genomic region at 15q24 (72-79 Mb, hg19) with the orthologous region in the rat genome (genome assembly rn5.0) to establish the blocks of the synteny, using ArkMAP [14]. We observed a complex evolutionary history of the region with several inversion and transposition events with some breaks of synteny at the regions harbouring the blocks segmental duplications in the human genome (S3 Fig). We found a linage specific inversion in Homo sapiens for the region D-E with respect to C-D and a deletion between the regions B-C and C-D. In addition, an additional block flanked by segmental duplications and distal to D-E has been translocated and the generation of the related segmental duplication blocks A-D suggest additional evolutionary rearrangements not fully resolved. A similar pattern to the rat alignment was observed when human was aligned to the mouse genome; however, no such brakes were observed with respect to the orangutan or macaque (S4 Fig). A detailed look into BAC libraries for both species revealed, however, evolutionary inversions and rearrangements in macaque across the orthologs to the human segmental duplications.
We investigated the extent to which the haplotype structure can be linked with reported polymorphic inversions in the region. Four individuals (NA18555, NA12156, NA19129 and NA19240) from 1000 Genomes have been reported with an inversion polymorphism between B-C region [1][2]. We found that three of these individuals had genotypes M1/M2, and one was M1/M1, suggesting M2 could be supported by the inversion polymorphism. However, out of the 29 individuals from the 1000 Genomes reported as non-inverted homozygous, we found 15 M1/M1, 8 M1/M2 and 6 M2/M2. The substantial disagreement between M1 and noninvariant status suggests an occurrence of the inversion in an older M2 background or recurrent unrelated events, and ruling out the reported inversion as a possible cause for the haplotype structure at M.
We searched in three BAC libraries and found that, in accordance to the top tag SNPs for N2, N1a and N1b, libraries RP11, CTD and CTA most likely correspond to haplotypes N2, N1a and N1b, respectively (S3 File). We then observed that there are different gaps in each of the libraries: 1) library RP11 has a gap in the segmental duplication D and in a point between C and D; 2) library CTD does not cross segmental duplications B nor C; and 3) library CTA does not cross either C or E (S4 Fig). The gaps in library CTD (N1a) are consistent with reported inversions at the proximal region B-C. Interestingly, all four reported individuals with inversions at B-C are heterozygous for N1a at C-D; NA18555 and NA19129 are N1a/N2 and NA12156 and NA19240 are N1a/N1b, suggesting that the inversion in B-C could have occurred in the long haplotype M2-N1a. However, we also observed that no inversion can be present between blocks C and D in any of the haplotypes. This could not be confirmed by interphase FISH where the BACs within C-D are too close to produce images with appropriate resolution (
We then inferred the M, N, O and P haplotypes in 909 Spanish children in the INMA (INfancia y Medio Ambiente) cohort and tested their association with IQ. For haplotypes at M, O and P, we used additive genetic models whereas for the three haplotypes at N, we used recessive models for N2, N1a and N1b ( Figure A in S7 Fig), to assess the specific contribution of each haplotype to the verbal and non-verbal IQ. We fitted Gaussian regression models for the normalized IQ measurements on all genetic models and adjusted for sex, age at test administration and first two genome-wide PCA components; see Table 1. We found a significant association between N2 and 2.9-point reduction in verbal-IQ.
We then aimed to replicate the association of haplotype-genotypes at N in the Dutch GenR (Generation R) and Canadian SYS (Saguenay Youth Study) cohorts. As in the previous analysis, in the 1,236 children from GenR population cohort, we identified six clusters, corresponding to the genotypes of three possible haplotypes in the region ( Figure  To assess the accuracy of this inference, we used the same eight SNPs in the European individuals of the 1000 Genomes and follow a similar procedure. We found that N2 could be inferred with 100% accuracy, while N1a and N1b were inferred with 99.4% and 99.6% accuracies, respectively. We performed a meta-analysis where the weights were the reciprocal of the estimated variance (Fig 5), fixed effects were considered to account for between cohort variability and no significant heterogeneity between cohorts was observed in all significant results. We found the N1b allele correlated with a 3.2-point loss in non-verbal intelligence (p-value = 0.0006). In addition, homozygosity for the N2 haplotype was the only genetic model that correlated with verbal IQ (mean decrement of 2.4 points, p-value = 0.01). As expected, for all three haplotype tests, we found high correlation between verbal and non-verbal IQ estimates (cor = 0.81 for N2, cor = 0.70 for N1a, and cor = 0.98 for N1b).
We analysed the 2,215 Scottish adults of ORCADES to investigate if the association in verbal IQ is also present in adults. For this specific cohort, we did not find a significant association (N2: p-value = 0.2, N1a: p-value = 0.5, N1b: p-value: 0.06). However, since the association with N2 homozygosity was also negative and comparable to that observed in children, we found an increment on the statistical significance in the overall meta-analysis (2.0-point loss, pvalue = 0.007); S8 Fig.

Functional correlation of N2 alleles with gene expression
We performed association tests between the normalized gene expression within the C-D region in 15q24.2 and the haplotypes-genotypes at N. We first analysed expression data of the 882 Estonians from the EGCUT study for each specific allele and found significant associations of local genes (S2A Table). The expression of MAN2C1 increased per N2 haplotype (p-value<10 −46 ), and decreased with both N1a (p<10 −8 ) and N1b (p<10 −7 ) haplotype (Fig 6). An additional significant association was found for SNUPN, which followed the same pattern of MAN2C1 (N2: p-value <10 −15 , N1a: p-value = 0.0005, N1b, p-value = 0.003). Using transcriptomic data in lymphoblastoid cell lines of the 105 CEU individuals of HapMap, we validated the expression pattern of MAN2C1 with respect to the N2 (p-value<10 −4 ) and N1a (p-value<10 −5 ) haplotypes (S2B Table). We then used the brain expression data of 193 control individuals. In agreement with the previous analyses, we found that the N2 haplotype was associated with higher MAN2C1 expression in cerebral cortex (p-value = 0.02), see Fig 6. However, we did not find significant associations with N1a and N1b. We also tested associations between the haplotype-genotypes and the expression of SNUPN in brain and validated a significant reduction per N1b allele (p-value = 0.05), see Fig 6. We analysed expression data from the BRAINeQTL study and BRAINEAC project to investigate the regional difference of gene expression in brain, see Fig 7. For the BRAINeQTL study, we correlated the expression of MAN2C1 in four different brain areas for 148 subjects and 15q24.2 Haplotypes, Gene Expression and Brain Development tested recessive models for the haplotypes to help interpret previous IQ correlations. We found that the N2 allele is associated with increments of MAN2C1 transcription in pons (pvalue = 0.0004), cerebellum (p-value = 0.01), frontal cortex (p-value = 0.01) and temporal cortex (p-value = 0.02) while homozygous for N1b had significant decrements of gene expression only in pons (p-value = 0.002) and no significant association was found for the N1a allele. In the BRAINEAC data-set of 134 individuals, we selected 36 intragenic probes within MAN2C1 and tested the correlation between the recessive models for each allele and the expression of the gene across 10 different brain regions. We found only two probes that survived Bonferroni correction within each region for MAN2C1. The first one was in the putamen for N2 homozygosity (p-value = 0.0004). The second probe confirmed the high correlation in the occipital cortex only with N1b (p-value<10 −4 ) and not for N2 (p-value = 0.8). We did not find significant results in the other areas of the brain. As correlations of gene expression were performed for adult brain, we further enquiry the Human Brain Transcriptome database (http://hbatlas.org/) to determine whether MAN2C1 expression changed during brain development. S9 Fig shows a clear

Discussion
We revealed the existence of a complex haplotype structure at the microdeletion region 15q24. Four large blocks of haplotype-genotypes were delimited by segmental duplications and showed strong stratification in global populations. These ancestral haplotypes of the region have been likely generated by suppression of recombination and two of them show apparent selection signals. One of these blocks is between segmental duplications C-D. We found that the block's three-haplotype structure (N2, N1a and N1b) correlated with IQ in children and with up-regulation and down-regulation of local of genes, MAN2C1 and SNUPN, which depended on brain region.
Microduplications involving C-D have been associated with autistic features and language problems while their haploinsufficiency has been related to intellectual disability [4]. Such gene dosage associations fit well with our observations in normal developing population showing that 1) higher expression of MAN2C1 associated to the N2 haplotype is linked with lower verbal IQ, and 2) lower expression of MAN2C1 related to the N1b haplotype is linked to reductions on non-verbal IQ. Remarkably, up-regulation of MAN2C1 by N2 is stronger in the frontal and temporal cortex, where language and high cognitive functions are processed, while down-regulation by N1b is most prominent in the occipital cortex, where it could affect processing of visual stimuli and thus influence non-verbal IQ. Our findings therefore suggest that MAN2C1 can be an important contributor to the cognitive impairments of both microdeletions and microduplications.
Given that the 15q24.2 haplotypes comprise a region in which copy number changes are causative of autism and cognitive impairment [4], we hypothesized that haplotype-genotypes could be susceptibility factors for autism and/or could be linked to cognitive variability in children. We did not find significant associations with autism but observed relevant yet weak correlations between the haplotypes and IQ profiles. The association in the meta-analysis for verbal IQ is improved with the inclusion of a large adult cohort, suggesting that the genetic effect may still be relevant later in life. In this study, similarly to large epidemiological studies that combined MSCA and WISC tests [15], we assumed that diverse IQ measures can be metaanalysed. While this constitutes a general drawback in neuro-epidemiological studies, the expectation of combining different correlated measurements is to reduce statistical power rather than increment false positives. A previous GWAS, based in the meta-analysis of different IQ measurements on 18,000 children (including SON-R 2 ½ -7 and WISC amongst many others), did not find any significant SNP at genome-wide significance and only one finding was reported at a gene-set level (FNBP1L, uncorrected p-value = 0.003) [16]. In particular, none of the SNPs or genes in the 15q24 region was found significant in that study. In a genome-wide Significant deregulation of MAN2C1 expression in the brain for N2, N1a and N1b homozygous. Numbers in boxplots correspond to the number of haplotypes considered in each group. MAN2C1 is ubiquitously over-expressed for N2 homozygous in brain, including pons, cerebellum, frontal cortex, temporal cortex and putamen, but not in occipital cortex, where N1b homozygous are strongly associated with the under-expression of the gene. Significant results in pons, cerebellum, frontal cortex, temporal cortex were obtained from the BRAINeQTL study while those for putamen and occipital cortex where obtained from the BRAINEAC project. Other brain regions in both data-sets showed non-significant associations. doi:10.1371/journal.pone.0157739.g007 15q24.2 Haplotypes, Gene Expression and Brain Development context, our findings could be similarly under-power by the variability of the IQ measures and age ranges. However, our study is akin a "candidate gene" study and require no correction for multiple comparisons and, as such, we do find statistically significant results. In addition, an important aspect of our approach was to identify the genes that may be causative of the cognitive impairments in genetic syndromes through population based analyses. We thus observed that MAN2C1 and SNUPN are the most likely genes whose variability can affect cognition through gene dosage.
Out of the 10 single copy genes located within the C-D interval, the expressions of SNUPN and MAN2C1 were consistently up-regulated at the N2 allele and down-regulated at the N1b configuration, both in blood and brain tissues. SNUPN encodes snurportin 1, a protein required, trough interaction with the spinal muscular atrophy protein SMN, for the nuclear import of snRNPs and splicing regulation of multiple genes [17]. As such, SNUPN de-regulation can affect the central nervous system but there is not yet evidence for a more direct relation to cognition.
MAN2C1 has been shown to have a dual function. MAN2C1 encodes the alpha-mannosidase, class 2C, member 1 that has been shown to regulate protein N-glycosylation and apoptosis. The N-Glycoproteome maps mainly to blood but the highest amount of organ specific Nglycosylation sites in mice has been observed in the brain [18]. As the MAN2C1 gene is highly expressed in hippocampal formation, its differential regulation by the different haplotypes might be related to the observed differences in cognitive function. Over-expression of MAN2C1 leads to protein underglycolysation and up-regulation of the degradation of unfolded glycoproteins [19]. The attachment of glycans to some proteins is important for their correct folding and/or stability. N-glycans cover diverse biological functions in the nervous system, ranging from the essential to the modulation of development and neural transmission, which in turn can affect plasticity and memory formation [20]. Multiple glycosylation disorders with associated neurological symptoms and impaired cognitive ability have been reported [21]. An additional function of MAN2C1, independent of N-glycosylation, is apoptosis signalling and tumour growth [22][23]. Down-regulation of MAN2C1 is linked to increments in apoptosis. The apoptotic action of the gene in the nervous system remains to be directly observed. Nevertheless, an indication of such function in the brain is given by previous findings showing that MAN2C1 is over-expressed in patients with posttraumatic stress disorder [24][25], and that such psychopathology presents reduced apoptosis associated with defects in signal plasticity [26]. Therefore, the available data indicate that both over-expression and under-expression of MAN2C1 can have a negative impact in brain function through different signalling pathways. A reading of our results consistent with MAN2C1's literature further suggests that N2 could be linked to underglycolysation while N1b to mitochondrial-mediated apoptosis. More generally, our analyses indicate that the "where and how" genes are expressed in the brain is important for the interpretation of the associations between structural variations and cognitive phenotypes. Interestingly, we found a continuous growth of MAN2C1 expression in brain during a life span, suggesting a more prominent role of glycosylation in development and apoptosis reduction at old age.
The convergence of two bioinformatics signals indicated that the haplotype structures could be supported by an inversion polymorphism between blocks C-D [11]. However, we could not find an inversion between CD and further experimental work is therefore needed. The small size of the single copy interval (<0.35 Mb) and the segmental duplications pose great difficulties to the experimental characterization by Next Generation data or cytogenetic methods such as FISH. Long reads such as those produce by PacBio could offer important insights. An unambiguous map between possible inversions at C-D and haplotype-genotypes is also challenged by the lack of tagging between the haplotype-genotypes in the proximal region between B-C and reciprocal reported inversions. Our data suggests, in particular, that the reported inversion at B-C may have occurred on a large haplotype background extending B-D and may not be causative of the haplotypes. This variable structure between individual chromosomes may add additional instability through meiotic mispairing and increased susceptibility to the reported recurrent germline rearrangements at 15q24 [4], as it has been shown in other genomic regions [27]. Given such complexity, more than one inversion configuration in size and extent is possible. That is, the orientation of the region the different segmental duplications may be polymorphic and its breakpoints extend to other segmental duplication blocks for different individuals. In addition, we cannot discard other mechanisms rather than inversions that could reduce recombination and sustain the haplotype-genotypes since migration from Africa such as structural variations within the segmental duplications or in the chromatin.

Detection and calling of large haplotype-genotypes
In this work, we used hg19 build as the reference, as it was the most common annotation over all datasets.
We used dense SNP data to detect large haplotype-genotypes in the chromosome band 15q24.1-15q24.3, prone to microdeletions and microduplications. It has been shown that the convergence of two different algorithms provide two distinct imprints of putative inversions on SNPs [11][12]. The first algorithm, inveRsion, is based on differences in LD between SNP blocks across inversion breakpoints [12]. This method allows one to search regions where unusually high linkage between distant points is observed. A positive signal between two points is given by the difference of Bayes Information Criterion (BIC), which, if greater than zero, indicates that the chromosomes of some individuals have higher SNP block linkage between the tested points than what would be expected under a null model. We first ran the inveRsion algorithm using 7 window sizes ranging from 0.4-1 Mb on the 73-79Mb (hg19) interval of the 15q24.1-15q24.3 genomic region, using the phased genotypes of 505 European individuals of the 1000 Genomes project (www.1000genomes.org). Genotypes with minor allele frequency of 0.01 were removed from the analysis.
The second algorithm, invClust [11], detects extended haplotype-genotypes that satisfy Hardy-Weinberg Equilibrium. It is based on the multivariate analysis of the SNPs within the interrogated region. The algorithm clusters individuals into haplotype-genotypes using the first two components of a multidimensional scaling (MDS) analysis of the SNPs in regions with positive signals given by inveRsion. We used a k-means method to identify the groups in the data produced by multiple allelic haplotypes. We then used data from 26 populations of the 1000 Genomes (http://www.1000genomes.org/) to study the population frequencies and heterozygosities as functions of distance from Ethiopia. Distances from Ethiopia were computed using likely migration paths around main water masses.

Autism dataset
The Autism Genome Project (AGP) consortium represents an international effort collecting autism families for ongoing genetic studies. Cases were classified using the Autism Diagnostic Interview-Revised and Autism Diagnostic Observation Schedule instruments. We were granted permission to access genotype data of 2,563 trios obtained with the Human1M-Duo BeadChip (llumina) (ref: phs000267.v4.p2) (http://www.ncbi.nlm.nih.gov/gap). This dataset was used to define the Mendelian inheritance of inversion-haplotypes at 15q24.2 and to test their transmission disequilibrium in autism or autism spectrum disorder (ASD). We analysed 2,259 trios of European descent with ages between 4-18 years.

General intelligence datasets
For association tests between IQ and inversion-related haplotypes, we used data from three independent cohorts of children and adolescents recruited from three different general populations: the INfancia y Medio Ambiente (INMA) Project, the Generation R (GenR) and the Saguenay Youth Study (SYS) (Figure A in S10 Fig and S4 File) GenR. This is a population-based cohort that at present covers fetal life to childhood of nearly 10,000 individuals born between 2002 and 2006 in Rotterdam, The Netherlands [29]. The data we used from this study comprised of genotypes from 1,236 European descendent children, obtained with Human610-Quad-BeadChip (Illumina). Intelligence measures were obtained at 6 years of age (SD = 0.4 years). Verbal IQ was assessed on 821 individuals using a subset of the Dutch battery [TaaltestvoorKinderen (TvK)], and nonverbal IQ was tested on 871 children using two subsets of the Dutch nonverbal intelligence test [Snijders-Oomen Niet-verbale intelligentie Test-Revisie (SON-R 2 ½ -7)] suited for children of 2.5-7 years of age. These two subsets tap into visuo-spatial abilities and abstract reasoning and their sum highly correlates with the full SON-R IQ battery.
SYS. We used data from the Saguenay Youth Study, established to study brain and cardiometabolic health in 1,024 adolescents, 12 to 18 years of age (15±3.5), recruited and assessed in the Saguenay Lac-Saint-Jean region, Canada, between 2003 and 2012 [30]. Genotypes of 1,953 individuals, including parents, were obtained with Human610-Quad and HumanOmniExpress-BeadChips (Illumina), only SNPs present in both chips were analysed. We performed association tests on a subset of 1,011 adolescents who were tested with Wechsler Intelligence Scale for Children III (WISCIII). We used the verbal and performance components of the full test.
ORCADES. We also studied whether the genetic associations persisted in adults, using a cohort of 2,215 individuals with mean age of 54 years (± 15). The Orkney Complex Disease Study is a family-based study in the isolated Scottish archipelago of Orkney. Genetic diversity in this population is decreased compared to Mainland Scotland, consistent with the high levels of endogamy historically. Fasting blood samples were collected and over 500 health-related phenotypes and environmental exposures were measured in each individual. Cognitive verbal fluency was measured with the Mill Hill vocabulary scale.
While different IQ measures were obtained at different age groups (4, 6, between 12 and 18, and 54 years) in each of the cohorts, it is known that MSCA and WISC scores positively correlate for normal and cognitively impaired children [31]. On the other hand, the correlation 15q24.2 Haplotypes, Gene Expression and Brain Development between SON-R 2 ½ -7 and WISC-R is 0.8 [32]. All IQ measures were standardized to mean 100 and standard deviation of 15. As general IQ measures were not available in all cases, we analysed verbal and non-verbal IQ separately. Population characteristics (ethnicity, sex and age) and genome-wide principal components of the children in all cohorts were used as covariates ( Figures B-D in S10 Fig).

Expression datasets
We analysed gene expression levels in RNA from peripheral blood obtained in Estonian Gene Expression Cohort (EGCUT) (http://www.biobank.ee/). This cohort is composed of 1,074 randomly selected Estonian individuals (37+/-16.6 years; 50% females) from 53,000 subjects in the Estonian Genome Center Biobank, University of Tartu. Whole-Genome gene-expression levels were obtained by Illumina HT12v3 arrays according manufactures protocols. DNA was genotyped with Human370CNV array (Illumina). The final sample size with both genotype data and gene expression data was 882 individuals. We tested association between gene expression in-cis with haplotype-genotypes at N.
Data of the transcriptomic analysis of RNA from lymphoblastoid cell lines of CEU individuals were obtained from the European Bioinformatics Institute at EMBL (project E-MTAB-198) (http://www.ebi.ac.uk/arrayexpress/), to validate the findings on the previous dataset. For this analysis only parents were selected, removing grandparents NA12282 and NA12283 of family 1421. The final sample analysed here included 105 individuals.
Gene expression data in brain cortex, Myers et al. 2007 [33], was analysed for probes in MAN2C1 and SNUPN. The data comprises 194 samples (with one removed after quality control) from the cerebral cortex of neurpathologically normal brains and was obtained with the Illunima HumanRefseq-8 Expression BeadChip. The individuals had self-defined ethnicity of European descent. For SNP genotyping, the Affymetrix GeneChip Human Mapping 500K Array Set was used. The expression and genotype data was downloaded from the website for The Laboratory of Functional Neurogenomics (http://labs.med.miami.edu/myers/).
For specific regions in the brain cortex, two gene expression data-sets were obtained from Gibbs et al 2010 [34] and BRAINEAC (http://www.braineac.org/), and analysed for MAN2C1 and SNUPN, the gene that was found significant in all previous analyses. The first data-set (BRAINeQTL, dbGap accession number: phs000249.v1.p1) contains gene expression for 148 subjects in pons, cerebellum, temporal cortex and frontal cortex. The second data-set consists on expression and genotype data for 134 subjects in 10 different brain regions: White matter, cerebellum, medulla, hippocampus, putamen, substatia nigra, thalamus, occipital cortex, temporal cortex and frontal cortex. We downloaded data for 36 intragenic probes within MAN2C1.

Data analyses
Recombination rates were downloaded from HapMap II (version 2011-01, http://hapmap. ncbi.nlm.nih.gov/downloads/recombination/) for chromosome 15. 10,000 random intervals of 3.15 Mb, corresponding to the A-D length, were extracted and their mean recombination rate computed. The average recombination rate in A-D was compared against this null distribution.
Phylogeny at N was reconstructed from the homozygous individuals N2, N1b and N1a from the 1000 genomes populations with the package phyclust (https://cran.r-project.org/web/ packages/phyclust/index.html). We used the Hamming distance for the pair wise evolution distance between chromosomes.
Haplotype-genotypes at M, N, O and P were inferred by invClust as mentioned above and then treated as bi-allelic variants coded as (0,1,2) if the individuals had 0, 1 or 2 copies of M2, 15q24.2 Haplotypes, Gene Expression and Brain Development N2, O2 and P2. snpStats (https://www.bioconductor.org/packages/release/bioc/html/snpStats. html) was used to compute a wide range of statistics like linkage disequilibrium (r 2 ) between the haplotypes and the SNPs in the region, Fixation index (FST), minor allele frequencies, Hardy-Weinberg Equilibrium and transmission disequilibrium tests for autistic children (TDT).
In our discovery sample, INMA, we fitted Gaussian regression models: IQ = hg + covariates, where hg are the haplotype genotypes corresponded to additive genetic models for M2, O2 and P2 and recessive for models for N2, N1a, N1b. Covariates were age of test administration, sex and first two PCA components to account for population stratification. The three haplotype structure at N was tested with recessive models for each haplotype to study its specific effect. Association tests between haplotype-genotypes and IQ were fitted with the linear function model (lm) of R.
Meta-analyses for associations between IQ measurements and the recessive models of N2, N1a N1b were performed with the rmeta package of R (https://cran.r-project.org/web/ packages/rmeta/index.html). We fitted fixed model effects for cohorts, where the weights were the inverse of the variance estimates, to account for cohort differences and give more weight to larger studies or those with less variability. No significant heterogeneity between cohorts was observed for significant results which supports our choice of model.
To discover in-cis elements with haplotypes at N, we tested associations of gene expression in blood for the ECGUT data using regression models between the logarithm of normalized expression and additive models for N2, N1a and N1b. Results were deemed significant for pvalues < 4 10 −3 to correct for the test of 11 genes in the region. Validation on the un-demented individual samples was tested with similar regression models on probes in MAN2C1 and SNUPN. We recover recessive models of N2, N1a and N1b to test their association between the logarithms of normalized gene expression levels in different brain regions and, hence, study the specific effect of each haplotype, likewise the IQ associations.