Widespread population variability of intron size in evolutionary old genes : 1 implications for gene expression variability 2 Intronic CNVs and gene regulation

6 Barcelona Supercomputing Centre (BSC), Barcelona, 08034, Spain. 7 Institut de Biologia Evolutiva, Consejo Superior de Investigaciones Científicas–Universitat 8 Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, 08003, Spain 9 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, 08010, Spain. 10 Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, NE2 4HH, United 11 Kingdom. 12 13 ¶ These authors contributed equally to this work. 14 * Corresponding authors 15


Introduction
Most eukaryotic protein coding genes contain introns that are removed from the messenger RNA during the process of splicing.Although the potential functions of introns remain elusive, a number of cases support the idea that the potential energetic disadvantage that they represent for the cell might be compensated by a number of acquired functionalities [1][2][3].For example, introns make possible the expression of multiple transcription products from a single gene by alternative splicing and facilitate the formation of new genes by exon shuffling [3,4].
Human introns are longer than those of other vertebrates and invertebrates [5,6] and their lengths are very variable, contrarily to exon lengths.This variability in intron length leads to substantial differences in size among human genes, which cause differences in the time taken to transcribe a gene from seconds to over 24 hours [7].Introns can influence several steps of gene transcription [8,9] and it has been seen that a considerable amount of intronic sequence is important in regulating gene expression [10].
Introns contribute to the control of gene expression by their inclusion of regulatory regions and non-coding functional RNA genes or directly by their length [3,11,12].Indeed, intron size is highly conserved in genes associated with developmental patterning [13], suggesting that genes that require a precise time coordination of their transcription are reliant on a consistent transcript length.
Highly expressed genes are enriched in housekeeping essential functions [14] and tend to have shorter introns [15].It has been suggested that selection could be acting to reduce the costs of transcription by keeping short introns in highly expressed genes [15].Genes transcribed early in development [16][17][18] and genes involved in rapid biological responses [19] also conserve intron-.CC-BY 4.0 International license peer-reviewed) is the author/funder.It is made available under a The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/171165doi: bioRxiv preprint first posted online Aug. 1, 2017; poor structures.Shorter introns would allow these genes to be transcribed faster and thus they may be particularly sensitive to changes in the time to be transcribed.Interestingly, Keane and Seoighe [20] recently found that intron lengths of coexpressed genes or genes participating in the same protein complexes tend to coevolve (their intron sizes show a significant correlation across species) possibly because a precise temporal regulation of the expression of these genes is required.
Despite their potential importance in regulating transcription levels, transcription timing and splicing, little attention has been payed to the potential role of introns in human population variability studies.Given that direct associations between intronic mutations and certain diseases have been reported [24][25][26][27], we need to characterise the normal genetic variability in introns so we can better distinguish normal from pathogenic variations.
It is increasingly apparent that one of the most important sources of variability is the presence of copy number variants (CNVs).CNVs are defined as imbalanced structural variants that result in the gain or loss of >50 bp of genomic sequence [28] and appear in more than one individual of the population.CNVs can be classified in gains (regions that are found duplicated when compared with expected number from the reference genome, which is 2 for autosomes), losses (homozygously or heterozygously deleted regions) or gain/loss CNVs (regions that are found duplicated in some individuals -or alleles -and deleted in others).Microarray and next generation sequencing approaches have shown that CNVs are more important and frequent than originally thought.CNVs may have neutral, advantageous or pathological consequences [29].Initially, CNVs thought to account for about 1% of the entire human genome [30], current estimates range from 4.8 to 9.5% [31].Here, we have studied the effect of intronic variants using CNV maps of high resolution.Most of these CNVs have been detected using whole genome sequencing (WGS) data, which allows to determine the exact genomic boundaries of these variants and thus their overlap with exons and/or introns.Despite significant advances in the detection of CNVs, discovery and genotyping of these variants remain challenging [32].To gain consistency, we have analyzed in parallel 5 recent CNV maps obtained by different groups with different experimental and analysis systems [31,[33][34][35][36].We have been able to compare the effect of the CNVs overlapping totally or partially with the genes of different evolutionary ages, studying in depth the effect of the intronic variants.We show how intronic variation results in widespread gene length variability in human populations and the potential impact of this variability in splicing and gene expression.

Most genic CNVs fall within introns
CNVs can affect genes in different ways depending on the degree of overlap with them.Some CNVs cover entire genes (from now on whole gene CNVs), other CNVs overlap with part of the coding sequence but not the whole gene (exonic CNVs) and other CNVs are found within intronic regions (intronic CNVs, Fig 1A).As defined, intronic CNVs do not overlap with exons from any annotated transcript isoforms or with exons from overlapping genes.
To analyze the impact of CNVs on protein coding genes in healthy humans, we used five recently published, high resolution CNV maps [31,[33][34][35][36].Each of the maps has been derived from a different number of individuals, from different populations and using different techniques and The total number of autosomal protein coding genes overlapping with common CNVs varies depending on the filtered map, ranging from 1,694 (according to Handsaker's map [34]) to 5,610 (Sudmant-Nature's map [36]), with a total of 7,267 genes (out of 19,430 autosomal protein-coding genes) affected by CNVs when aggregating all 5 maps.Remarkably, only 402 (5.5%) of all genes affected by CNVs coincide in the 5 maps (S2 Fig) .However, this overlap is non-random (P < 2.2e-16).
. Most of the CNVs overlapping with genes fall within intronic regions (∼63% of all CNVs) without any overlap with exons.More surprisingly, of the purely intronic CNVs detected, over the 94% are losses or gain/loss CNVs.This is in stark contrast with whole gene CNVs, which tend to be exclusively gains (55% of the cases) or gain/loss CNVs (25% of the cases) (Fig 1B-F).There is a significant enrichment of purely intronic losses (P < 0.001; permutation testing) in 4 out of 5 maps, with 6 to 15% more deletions falling in introns than expected by chance, depending on the CNV map.(No significant differences with the expected values were found with Sudmant-Science's map, P = 0.6683).In contrast, in protein-coding genes, there were 13-70% fewer CNV deletions overlapping with exons than would be expected by chance, depending on the map (P < 0.05 in all maps).
Given the potential regulatory role of introns and the high frequency of purely intronic deletions in the population, we focused on the impact of CNVs on introns.For all the subsequent analyses we restricted our set of CNVs to loss and gain/loss CNVs, as they together represent sequences that are lost in some individuals.It is important to note that three maps (Sudmant-Nature's [36], Zarrei's [31] and Abyzov's [33]) together represent the 86% of all intronic deletions from our datasets.The methods used to generate the other two maps (Handsaker's [34] and Sudmant-Science's [35]) tend to detect less losses and larger CNV regions, that result in maps with fewer purely intronic deletions (S1 Fig) .For the above reasons, we focused our analyses of intronic deletions on the three maps with more intronic deletions (Sudmant-Nature's [36], Zarrei's [31] and Abyzov's [33]).We checked that this enrichment of intronic losses was still significant when controlling for different intron sizes (S3 Finally, we tested whether intronic losses were distributed equally between essential and nonessential genes.We separated the protein-coding genes into two groups: those that have been reported to be essential after CRISPR-based genomic targeting [37,38] or gene-trap insertional mutagenesis methodology [39], and those which were not found to be essential.Strikingly, we observed that the proportion of intronic deletions is higher than expected by chance in both essential and non-essential genes (S2 Table ).The fact that these intronic deletions can appear in essential genes suggests that they might be an unexpected source of genetic variation that could potentially influence the regulation of functionally relevant genes in human populations.

Intronic losses accumulate in evolutionarily old genes, while losses in coding regions are more frequent in young genes
Intrigued by the overrepresentation of intronic deletions in human protein-coding genes, we next investigated in more detail the quantitative and functional impact of these deletions.We have previously reported that the evolutionarily younger a gene is, the more likely it is to carry whole gene CNVs in the population [40].Here we confirm that most ancient genes are depleted of CNVs that affect their coding regions, while primate-specific genes are enriched in CNVs (S4 Fig) .This pattern was also observed when CNV gains were excluded (Fig 2A).The generation of random background models revealed that ancient genes were significantly depleted of coding region losses (both exonic and whole gene) (P < 0.05), while these were enriched in young genes (P < 0.05) (Fig

2A).
. CC-BY 4.0 International license peer-reviewed) is the author/funder.It is made available under a The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/171165doi: bioRxiv preprint first posted online Aug. 1, 2017; 10 Surprisingly, we observed an opposite trend for purely intronic deletions: the proportion of ancient genes with intronic deletions was higher than that of young genes, and also higher than expected by chance (Fig 2B).This finding was confirmed with additional analyses considering only genes with introns and adjusting by the different size distributions of introns (S5 Fig) .The introns of essential genes tend to be shorter [21,22] and essential genes also tend to be ancient [41].Therefore, we compared the intron size of genes within the same age groups and found that the introns of essential genes are shorter than those of non-essential genes of the same age (S6A and S6B Fig).
Even if introns are shorter in essential genes, we found a significant proportion of them present intronic deletions in the Sudmant-Nature's map [36] (S6C Fig) suggesting that intronic size variation in the population might be more important than we originally thought.

Intronic deletions result in population variation of gene lengths
The percentage of each intron that can be lost due to CNV losses is highly variable, from 0.03% to 96.8%, representing a loss of the 0.01% to 77.5% of the total genic size.(Fig 3A-C).Some examples of genes with a notable change in size after a single intronic deletion are the neuronal glutamate transporter SLC1A1, with a loss of the 37% of its genic size, TCTN3 (tectonic family member 3), which loses 45% of its gene size and the LINGO2 (Leucine Rich Repeat And Ig Domain Containing 2, alias LERN3 or LRRN6C) gene with a loss of the 34% of its size (Fig 3D).
Remarkably, these genes are highly conserved at the protein level and are amongst the 20% of genes most intolerant to functional variation according to the ranking of the RVIS (Residual Variation Intolerance Score) gene scores, which is based on the amount of genetic variation of each .
gene at an exome level [42].This result shows that genes with a very conserved coding sequence with a general depletion of deletions [36] can have important losses of intronic regions, which might affect their regulation without affecting their protein structure.
Intronic deletions can impact regulatory regions such as enhancers or CTCF binding sites, which are enriched (P < 1e-04) and impoverished (P < 1e-04) in introns, respectively.Indeed, we find very frequently deletions in introns with these regulatory features (P < 2.2e-16).However, the direct overlap of the deletions with both enhancers and CTCF binding sites is significantly lower than expected by chance (P < 1e-04, S3 Table ).This suggests intronic losses can occur close to regulatory features within introns but that deleting part of a regulatory feature might often have a deleterious impact.
Besides altering the size of the introns or disrupting regulatory regions, intronic deletions could also affect splicing, which is required to avoid the translation of introns.In genes with long introns, the recognition of introns and exons by splicing machinery is based on their differential GC content (Amit et al. 2012, Gelfman et al 2013) as the lower GC content in introns facilitates their recognition.Presumably, this recognition mechanism has contributed to the expansion of introns in higher eukaryotes (Hollander et al 2016).We analyzed the GC content of introns with deletions and found that the deleted sequences had a significantly higher GC content to that of the introns where they are located (P = 1.8e-28).Moreover, we observed that the loss of these fragments decreased significantly the overall GC content of the remaining introns (P = 2.23e-16).Our results suggest that the deletion of GC rich regions within introns could lower the overall GC content of the intron, increasing the difference of GC content between introns and their flanking exons, what could .

facilitate exon definition during splicing (S8 and S9 Fig).
We have shown that deletions within introns are widespread in introns of varying sizes and can produce important changes on the sequence composition and the regulatory architecture of many protein-coding genes, which might be relevant for transcription and splicing of those genes through different mechanisms.Intronic deletions constitute a previously unexpected source of variation in gene and transcript length across individuals and can subtly affect ancient genes with important functions that don't tolerate more drastic alterations.

Effect of intronic deletions on gene expression
Multiallelic CNVs affecting whole genes have been shown to correlate with gene expression: generally, the higher the number of copies of the gene, the higher its expression levels [34,36].Our data suggests that the intronic size variation could also impact the expression of the affected genes.
Therefore, we looked into the possible effect of intronic hemizygous deletions on gene expression variation at the population level, comparing the effects with hemizygous deletions in coding (whole gene and exonic) and intergenic non-coding deletions.We used available RNA-seq data from Geuvadis [43] that was derived from lymphoblastoid cell lines for 445 individuals for whom we have the matching CNV data (Sudmant Nature's map [36]).In order to look for differences in gene expression we selected variants for which we had at least 2 hemizygous individuals (individuals with copy number = 1) and at least 2 wild-type individuals (copy number = 2) and we compared the We first studied the effect of intronic deletions on gene expression and we observed significant differences in gene expression in 52 out of the 1,474 genes with intronic deletions (3.5%) in lymphoblastoid cell lines.This percentage is higher than expected by chance (P = 1e-4) (Fig 4 ), being the expected values the total of differentially expressed genes (DEGs) when randomizing the individuals carrying the mutation.Of the DEGs, 62% were downregulated and the other 38% upregulated, suggesting that intronic deletions might result both in enhancing or repressing gene expression.
We investigated if deletions in introns of genes showing differential expression tended to overlap with regulatory features, but we did not observe any significant enrichment (P = 1).Even though first introns are known to be particularly important for gene regulation [3,44], there was no significant enrichment of DEGs with their first intron affected (P = 0.86).These results suggest that other mechanisms independent of intronic regulatory regions might be responsible for these changes in gene expression.It is also possible that a combination of multiple different mechanisms may be necessary to explain the observed effects.In addition, we cannot rule out that the lack of association between intronic regulatory features and gene expression changes is due to the small number of DEGs in this cell type and/or lack of detailed enough epigenomic annotations.
We wondered how the impact of non-coding intronic deletions in gene expression compared to those of non-coding intergenic deletions.We focused on intergenic regions that show long-range interactions with promoters of protein-coding genes -what it is generally assumed to reflect a regulatory function for these intergenic regions [45].The impact of noncoding intronic versus intergenic deletions on gene expression was therefore studied.We used promoter-capture Hi-C .published data for B-lymphocytes [46] to link deletions in intergenic regions with interacting genes.
Significant changes in gene expression were seen in 11 out of 872 (1.26%) genes identified to have a deletion in an intergenic contacting region.Contrary to the effect of intronic deletions within the same gene, this percentage of DEGs was not different to that expected by chance (P = 0.08).Therefore, our data suggests that variation within intronic regions may have a more significant impact on gene regulation than intergenic regions.
The effect on gene expression appears to be greater when coding regions were affected, compared to purely intronic sequence losses: 15 out of 51 (29.4%, P < 1e-4) whole gene deletion CNVs resulted in significant downregulation of gene expression and 30 out of 239 genes with partial exonic deletions that were differentially expressed (12.6%, 28 down-and 2 up-regulated (P < 1e-4).
However, given the higher frequency of intronic deletions in the population, the absolute number of DEGs with intronic deletions (52 genes) was similar to the total of DEGs with coding deletions (45 ).Moreover, while coding losses mostly associate to gene downregulation, intronic losses are frequently associated to gene up-regulation.This shows the potential global relevance of intronic deletions on gene expression, especially considering their frequency in ancient genes (27.9 % in genes older than Sarcopterygii) is almost the double than the one for coding deletions (14.6%, Fig 2 ).In summary, these data suggest that intronic variants could have an important regulatory impact on ancient genes.
The different CNV maps used were built using different datasets and CNV calling algorithms, resulting in very different numbers of CNV sizes and types.Still, we think that each of these studies has their own limitations and probably none of them actually reflects all the variability in the genome.Therefore, instead of merging them into one map, we preferred to analyse the maps in parallel.This allowed us to compare the consistency of the results and, at the same time, helped us to better understand the peculiarities of each CNV set.We saw very consistent trends when we analysed the enrichment of intronic deletions or the differential impact of CNVs depending of the evolutionary age of genes, what show the robustness and generality of our results.At the same time our data suggest that using the different maps in parallel can be a useful way to cross-validate biological findings.
Structural variants in the germline DNA constitute an important source of genetic variability that serves as the substrate for evolution.Therefore, dating the evolutionary age of genes allows the study of structural variants that were fixed millions of years ago.We have previously shown that genes of different ages are found in different proportions within current human CNV regions [51].
. Whole young gene loci, contrarily to ancient gene loci, are very variable in copy number and tend to be located in late replicating genomic regions, which are more error-prone and have less precise DNA repair mechanisms than earlier regions [51].Fixation of duplications or losses of whole genes in these regions can lead to the birth of new genes or to their disappearance (Fig 5).
Here, we have observed that also gains and losses affecting only part of the coding sequence are also enriched in young genes (Fig 5).Such CNVs can disrupt the protein sequence, but they can also eliminate, duplicate or relocate exons or parts of exons, giving to the organism a mechanism to modify young genes.
On the other hand, evolutionarily ancient genes, generally depleted of CNVs overlapping with their coding regions, are especially enriched with intronic losses (Fig 5).This phenomenon shows that although the protein sequence is usually unaffected, changes in the intronic sequence can modulate the expression of the gene and promote variability in the population.We found in lymphoblastoid cells more differentially expressed genes associated with intronic losses than expected by chance.This association is expected to be even stronger as for many genes the effect of their intronic losses will be only observed in other cell types or tissues.
Very interestingly, we observed differences in which genes show changes in gene expression when affected by coding (exonic or whole gene) or purely intronic losses.We see that differences in expression in younger genes are mainly associated to full gene dosage changes or partial disruption of their coding sequence.In contrast, ancient genes that generally are less tolerant to any kind of mutations in their coding sequence, are enriched in intronic deletions which that could be modulating their expression (Fig 5).The availability of CNV and population-based gene expression .data from several tissues will allow to evaluate more accurately what is the impact of coding and non-coding deletions in the whole organism.
CNVs can be directly disrupting a regulatory feature or affect the distance, for example, between promoter and enhancer.We found that the presence of enhancers is significantly enriched in introns, agreeing with previous findings in plants [11,44].In general, genes with complex regulation patterns require more regulatory DNA [52] and introns tend to be longer in tissuespecific and transcription factor genes compared to housekeeping genes [21,53].Since many enhancers are tissue-specific [54] intronic CNVs might frequently have effects on particular cell types.Therefore, the loss of intronic sequence might be affecting the expression of such genes in a tissue-specific manner.
Our results also suggest that non-coding intronic deletions might have a wider impact on population gene expression variability than deletions in non-coding intergenic regions that interact with promoters, given that intronic deletions correlate with gene expression changes more often than expected by chance, while promoter-interacting intergenic regions don't.However, intergenic deletions were associated with genes using promoter-capture HiC data maps derived from a few pooled genomes [46] and we may need to have personal genome interactomes, more tissues and conditions to evaluate more precisely the effect of intergenic deletions on gene expression.Furthermore, with the necessary experimental and analytical future advances, it will be extremely exciting to see how individual copy number variants change the personal landscape of interactions among promoters and other genomic elements.
We speculate that intronic CNVs might have a previously unsuspected role in shaping gene expression variability in populations with potential important consequences in human evolution and adaptation.After uncovering the relevance of gene length variation in the healthy population by frequent intronic deletions, the next open question will be if any of these common non-coding variants may be associated with disease.In fact, these population-based CNV maps could be useful to identify disease relevant and irrelevant intronic regions.It is now well known that most genomewide association studies (GWAS) associated SNPs tend to be located in intronic and intergenic regions and the pathogenicity of non-coding CNVs, mostly in upstream promoters, is starting to emerge [55].Thus, future case-control studies including WGS should also pay attention at potentially important role of purely intronic variation.While exons cover around the 2.8% of the genome, introns cover 35.3%, of the genome (based on the gene set used for this study).WGS studies are starting to focus on distal intergenic enhancers, but intronic regions are commonly ignored in the analyses.A recent analysis of the literature has revealed a substantial amount of pathogenic variants located "deep" within introns (more than 100 bp from exon-intron boundaries) which suggests that the sequence analysis of full introns may help to identify causal mutations for many undiagnosed clinical cases [27].With the results presented here, we emphasize the importance of sequencing and analysing variants located in introns as they can potentially be as consequential as regulatory elements found in intergenic regions.
Being intronic deletions so common in the healthy population, it will also be interesting to explore how frequent are purely intronic somatic deletions in cancer and evaluate their potential contribution to the reprogramming of gene regulation of cancer cells.For example, are there somatic deletions of intronic sequences that result in the shortening of oncogenes, favoring their .CC-BY 4.0 International license peer-reviewed) is the author/funder.It is made available under a The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/171165doi: bioRxiv preprint first posted online Aug. 1, 2017; higher expression.The role of cancer somatic variants in distal regulatory regions is just starting to be explored [56][57][58][59][60].As we have shown that intronic regions are significantly enriched in regulatory regions in the human genome, understanding the functional effect somatic intronic deletions in cancer could be an attractive new field of research with high potential for discovery.It has been previously proposed that high-order chromatin architecture is influencing the landscape of chromosomal alterations in cancer [61].We hypothesize that the high-order genome organisation in healthy cells is applying constraints on where variability can be high or low, allowing high variability anywhere in young genes but only in introns for ancient genes.Therefore, it will be interesting to understand better how these constraints change comparing data from healthy cells with the frequent aneuploidy in tumors, especially in radical re-structuring events originated by chromothripsis [62] .
In summary, our data shows that intronic CNVs constitute the most abundant form of CNV in protein-coding genes.These intronic length variation possibly means that the actual size of many genes is not yet fixed in human populations.We show that intronic length variation is particularly frequent in evolutionary old genes, with a significant proportion of them showing associated gene expression changes.This suggests that intronic CNVs might be actively contributing to the evolution of gene regulation in many genes with highly conserved protein sequences.Taken together, our results suggest that copy number variation is shaping gene evolution in different ways depending on the age of genes, duplicating or deleting young genes and fine-tuning the regulation of old genes.
. In addition, we generated background models correcting by DNA replication timing.For this, we downloaded DNA replication timing data from 15 cell lines from ENCODE [65,66] and assigned the median value of all cell lines to each 1 Kb window of the genome.Then, we classified the genome in 5 intervals of DNA replication timing and we relocated the CNVs within its interval of replication timing.
We compared the location of the CNVs in our datasets and compared with their distribution in the random models in order to calculate enrichments or depletions depending on the intron size and gene age and essentiality.

Regulatory features
We downloaded a genome-wide set of regions that are likely to be involved in gene regulation from the Ensembl Regulatory Build [67].We checked if introns are enriched in these regulatory features (promoters, enhancers, promoter flanking regions or insulators) by comparing to a random background model generated by relocating 10,000 times all regulatory features in the genome.Pvalues are the fraction of random values superior or inferior to the observed values.
In order to check for the significance of the overlaps between intronic deletions and regulatory features we relocated 10,000 all intronic deletions within their introns and checked for differences in overlap with regulatory features.

Gene expression analysis
We used available RNA-seq data at Geuvadis [43] that was derived from lymphoblastoid cell lines for 445 individuals who were sequenced by the 1000 Genomes Project and for whom we have the 23 intronic deletions in the largest CNV map [36].We focused our analyses on the 763 genes that have only one intronic deletion in the population with at least two individuals affected in the Geuvadis dataset.For each of these genes we classified the PEER normalized gene expression levels [68] in two groups depending if the individual carried or not the intronic deletion and performed Student's t-tests.We corrected for multiple testing with p.adjust R function (Benjamini-Hochberg method).In addition, we randomized the individuals with the intronic deletions 10,000 times and calculated the expected percentages of significantly differentially expressed genes.that contain deletions overlapping with exons, including partial and whole gene CNVs (A) or intronic deletions (B).The light blue line represents the expected value, calculated as the mean of the genes in the 10,000 random permutations.Red asterisks mark the significantly enriched groups of genes, while black asterisks mark gene age groups with fewer deletions than expected (P < 0.05).
Plot (C) shows, from all the genes overlapping with deletions after aggregating the three maps, what is the proportion of genes that have all or part of their exons affected by deletions and what is the percentage of genes with intronic deletions only.Bar width is proportional to the percentage of genes from each evolutionary age that is affected by deletions of any kind, which spans from 18.5% (Mammalia) to 49.8% (HomoPanGorilla).The equivalent figure for each separate map is shown in S7 Fig.    A) algorithms for CNV detection (S1 Fig and S1 Table).Due to these differences, each dataset provides us with a different set of CNVs (S1 Fig), which we analysed independently.We only considered the variants present in at least 2 individuals in a dataset, filtering out the variants mapped in sex chromosomes and the private variants within each map.
expression levels among these two groups (Fig 4A and S10 Fig).

Figure captions Fig 1 .
Figure captions

Fig 2 .
Fig 2. Evolutionary age of affected genes.Percentage of genes from each gene evolutionary age

Fig 3 .
Fig 3. Changes in intron and gene size.(A) Proportion of the reference intron that has been

Fig 4 .
Fig 4. Differential expression.(A) Number of genes with whole gene, exonic or intronic deletions

Fig 5 .
Fig 5. Impact of CNVs on genes and their evolution.Evolutionarily ancient and young genes