3D genome evolution and reorganization in the Drosophila melanogaster species group

Topologically associating domains, or TADs, are functional units that organize chromosomes into 3D structures of interacting chromatin. TADs play an important role in regulating gene expression by constraining enhancer-promoter contacts and there is evidence that deletion of TAD boundaries leads to aberrant expression of neighboring genes. While the mechanisms of TAD formation have been well-studied, current knowledge on the patterns of TAD evolution across species is limited. Due to the integral role TADs play in gene regulation, their structure and organization is expected to be conserved during evolution. However, more recent research suggests that TAD structures diverge relatively rapidly. We use Hi-C chromosome conformation capture to measure evolutionary conservation of whole TADs and TAD boundary elements between D. melanogaster and D. triauraria, two early-branching species from the melanogaster species group which diverged ∼15 million years ago. We find that the majority of TADs have been reorganized since the common ancestor of D. melanogaster and D. triauraria, via a combination of chromosomal rearrangements and gain/loss of TAD boundaries. TAD reorganization between these two species is associated with a localized effect on gene expression, near the site of disruption. By separating TADs into subtypes based on their chromatin state, we find that different subtypes are evolving under different evolutionary forces. TADs enriched for broadly expressed, transcriptionally active genes are evolving rapidly, potentially due to positive selection, whereas TADs enriched for developmentally-regulated genes remain conserved, presumably due to their importance in restricting gene-regulatory element interactions. These results provide novel insight into the evolutionary dynamics of TADs and help to reconcile contradictory reports related to the evolutionary conservation of TADs and whether changes in TAD structure affect gene expression.


Introduction
The recent development of Hi-C sequencing techniques has allowed for inference of threedimensional chromosome conformation through identification of inter-and intra-chromosomal interactions at high-resolution across the entire genome. Visualization of gene contacts and contact frequencies led to the discovery of organizational features called topologically associating domains, or TADs, which bring genes in close proximity with their regulatory elements [1]. TADs are regions of highly-interacting chromatin that contain genes with similar expression patterns and epigenetic states, and their location is conserved throughout development and across tissue types in both mammals and Drosophila [2][3][4]. Domains are demarcated by boundaries which are regions of decompacted chromatin bound by insulator proteins [5]. In vertebrates, the CCCTC-binding factor (CTCF), along with the structural maintenance of chromosomes (SMC) cohesin complex, play a major role in specifying TAD boundaries [2,[6][7][8], whereas in Drosophila, CTCF and SMC binding show little enrichment at TAD boundaries [9]. Instead, other insulator proteins, including BEAF-32, Chromator, CP190, and M1BP are more frequently found at TAD boundaries [3,[10][11][12][13] and depletion of M1BP has been shown to disrupt 3D genome organization in the Drosophila Kc167 cell line [10].
Most research thus far investigating 3D genome structure has operated under the prevailing theory that TADs regulate gene expression by limiting potential gene-enhancer interactions to those within a given domain. This theory is supported by a variety of studies in mammals. For example, genome-wide enhancer-promoter contacts in mouse neurons occur mainly within TADs [14] and reporter gene-enhancer interactions have been shown to be correlated with TAD structure [15]. Furthermore, disruption of TAD boundaries has been associated with aberrant enhancer/promoter contacts, gene misregulation, developmental abnormalities and cancer [16][17][18][19][20][21]. Developmental genes in Drosophila have also been shown to engage in longdistance interactions with distal regulatory sequences. Ghavi-helm et. al. [22] performed 4Cseq with viewpoints centered on known developmental enhancers and found that each enhancer interacted with two promoters and three other enhancers on average. Seventy-three percent of interactions spanned distances larger than 50 kb and the majority of interactions occurred within the same TAD. These long-distance interactions represent a level of 3D connectivity comparable to humans [22].
The functional role of TADs with respect to the regulation of gene expression has important implications for 3D genome evolution. If TAD structure is critical for proper gene regulation, then the evolution of 3D genome organization should be highly constrained and related species should show strong conservation of TAD structures. Consistent with this prediction, a variety of studies in vertebrates have reported strong conservation of 3D genome organization using comparative Hi-C approaches [2,7,23,24]. A recent study in Drosophila reported that 3D genome architecture is conserved over 40 million years of evolution in spite of extensive chromosomal rearrangements [25]. These studies support a model where chromosomal rearrangements that preserve TADs (i.e. their breakpoints are located within TAD boundaries) are much more likely to be retained over evolutionary time compared to rearrangements that disrupt TADs (Fig 1, Model 1). Under this model, TADs are shuffled as whole units over evolutionary time due to selection to maintain the 3D interaction properties of the genes and regulatory sequences within them.
However, more recent research suggests that TADs may be frequently reorganized over evolutionary time. Notably, one recent study found that only 43% of TADs are shared between humans and chimpanzees [26]. Furthermore, there are a number of studies showing that gene expression profiles remain unperturbed upon TAD reorganization. For example, the extensive changes in chromosome topology caused by the rearrangements found on a Drosophila balancer chromosome are not associated with widespread differences in gene expression [27]. Also in Drosophila, studies involving deletion mutations [28] and experimentally-induced inversions [29] found that these mutations, which should disrupt TAD organization, had little effect on gene expression. Similar observations have been made in mice: fusion of TADs does not have major effects on gene expression [30]. These studies suggest that TADs may diverge relatively rapidly over evolutionary time, with little effect on gene expression (see Models 2 and 3, Fig 1).
One possibility that might explain these apparently contradictory results is that there are distinct functional subtypes of TADs, with some being more tolerant of reorganization than others. Consistent with such a possibility, a recent study identified a subset of ancient, highlyconserved TADs in both vertebrates and flies that are enriched for conserved noncoding elements and developmental genes [31]. Based on findings that TAD locations and/or contact frequencies are conserved between species at a level beyond that expected by chance, previous studies in mammals [2,7,23] and flies [25] have concluded that TADs in general are evolving under strong constraint. However, different subtypes of TADs may also be subject to different evolutionary forces, with some, such as those containing developmental genes, evolving under purifying selection, while others could be evolving neutrally, or even under positive selection.
Here, we have compared 3D genome organization between Drosophila melanogaster and Drosophila triauraria, which diverged �15 million years ago [32]. We chose this species pair because, based on the evolutionary rate of chromosomal rearrangements inferred by [33], they should have accumulated extensive chromosomal rearrangements since their divergence, on the order of several hundred synteny breakpoints. We have improved a previously published D. triauraria genome assembly [34] by performing additional nanopore sequencing and Hi-C scaffolding which yielded chromosome-length scaffolds. We identified 991 rearrangement breakpoints between the Drosophila melanogaster and Drosophila triauraria genome assemblies, which contain blocks of synteny that are at least twice as large as TADs (�120 Kb, see Results. Drosophila TADs: 23-63 Kb average length [10]). We then used two biological replicates of Hi-C sequencing data to identify high-confidence TADs and TAD boundaries in each species. Overall, we find that only 25% of TADs are orthologous between these two species. The majority of TADs have been reorganized by a combination of chromosomal rearrangements and TAD boundary gain/loss. Changes to TAD structures are associated with differential gene expression near the sites of disruption but not across entire TADs. We also find evidence that subtypes of TADs differ with respect to their evolutionary dynamics: conserved Each model shows TAD contact domains (triangles) and boundaries (circles) in two hypothetical species. Model 1: chromosomal rearrangements occur at TAD boundaries, resulting in domains shuffling as intact units, represented by the purple and lilac TADs. Model 2: TAD boundaries remain conserved but TADs themselves are disrupted. In this example, an inversion whose breakpoints (jagged lines) occur within two separate TADs results in TAD reorganization, seen here as the two chimeric purple/lilac TADs. Model 3: both TADs and their boundaries evolve rapidly. The gain and loss of boundary elements (black star and 'X', respectively), leads to further TAD reorganization, beyond that observed in Model 2. https://doi.org/10.1371/journal.pgen.1009229.g001

PLOS GENETICS
3D genome evolution in the Drosophila melanogaster species group TADs are enriched for Polycomb-repressed and developmentally-regulated chromatin, while transcriptionally active TADs containing broadly-expressed genes evolve rapidly, possibly due to positive selection. On the other hand, TADs located in pericentric heterochromatin may be evolving neutrally. We propose that evolutionary divergence in 3D genome organization results from shuffling of conserved boundary elements across chromosomes along with the formation of lineage-specific boundaries. Both of these processes break old TADs and create new TAD architectures. Our results also support the existence of functionally distinct TAD subtypes: some TADs may be evolutionarily flexible and able to be reorganized without perturbing gene expression, whereas there may also be a distinct set of developmentally-regulated TADs that remain highly conserved due to their importance in restricting long-distance generegulatory element interactions.

D. triauraria genome assembly
A recently published genome assembly for D. triauraria was made using relatively low-coverage (�18.8x) nanopore sequencing data [34]. In order to create an improved assembly, we performed additional long-read nanopore sequencing of genomic DNA extracted from �30 adult females from D. triauraria strain 14028-0651.00 (National Drosophila Species Stock Center at Cornell). We used three r9.4 flow cells to generate a total of 633,844 reads (10,287 bp mean length) and 6.5 Gb of sequencing data. We combined our data with the previously published nanopore data from D. triauraria [34] for a final dataset of 1,043,600 reads (total size: 10.5 Gb, coverage: 52x). We basecalled the raw signal data with Albacore (version 2.3.4, available from Oxford Nanopore Technologies) and assembled the basecalled reads with Canu [35], which produced an assembly with contig N50 of 1.3 Mb (1098 total contigs, 269 Mb total size). We then polished the assembly by using Nanopolish [36] with raw nanopore signal data and Pilon [37] with Illumina data, which corrected a total of 1,185,510 assembly errors. Next, we used Purge Haplotigs [38] to identify allelic contigs, where highly heterozygous haplotypes were assembled as separate contigs rather than collapsed. After removing secondary haplotigs and bacterial contigs, our final contig assembly consisted of a total of 294 contigs which sum to �200 Mb and have an N50 of 1.7 Mb, which is 2.4X larger than the previously published study (N50 = 720 Kb) [34] (S1 Table).
We next performed Hi-C scaffolding using the polished nanopore contigs and the software packages Juicer [39] and 3D-DNA [40] (Fig 2A and 2B). The scaffolding process produced chromosome-length scaffolds, reflected by the dramatic increase in N50 from 1.7 Mb to 34.5 Mb. The gene content of chromosome arms is conserved across Drosophila, and all Drosophila species contain five chromosome arms, which are designated as Muller Elements A-F. In order to assign the D. triauraria scaffolds to Muller elements, we performed a translated BLAST search of our scaffolds using D. melanogaster peptides as queries and keeping only the best hit for each query sequence. We found that each scaffold was highly enriched for D. melanogaster peptides from a specific Muller element (S1 Fig) and we successfully identified scaffolds corresponding to Muller elements A-F for downstream analysis. In order to predict complete gene models for our assembly, we generated RNA-seq data from D. triauraria ovaries, testes, and embryos. We combined the RNA-seq data with ab initio gene predictions from Augustus [41] and SNAP [42] and homology-based predictions from D. melanogaster peptides using MAKER [43].
D. triauraria was previously sequenced unintentionally because it was mislabeled as D. kikkawai [34], which means that the published D. triauraria nanopore data are from an unknown strain. The strain we sequenced is from the same stock center, making it likely that the contaminant and our strain, are in fact the same strains. To test this, we aligned the Illumina data from Miller et. al. [34], in conjunction with our uninformative Hi-C reads, to our nanopore assembly and called SNPs using FreeBayes [44]. We compared the genotypes from the Miller et. al. [34] strain to our strain at �93.7 million sites and found that 93.5 million sites (�99.8%) were homozygous reference in both datasets, while at 215,000 sites, both strains had the same heterozygous genotype. For another 37,000 sites, one strain was identified as homozygous and the other heterozygous. The strains were in complete disagreement (i.e. they were homozygous for different alleles) at only 3 sites. From this analysis, we concluded that the D. triauraria strain mislabeled D. kikkawai was in fact the same strain we sequenced.
Several nested polymorphic inversions have been found to be segregating at high frequencies in D. triauraria. Mavragani et. al. [45] identified a set of nested inversions on two different chromosomes in D. triauraria based on inspection of polytene chromosomes but did not assign these chromosomes to Muller elements. We manually inspected the Hi-C contact map for evidence of polymorphic inversions and identified two inversions on Muller B, two inversions on Muller C, and three inversions on Muller D (S2 Fig).

Genome synteny
We next sought to identify synteny blocks and assess the degree of chromosomal rearrangements between D. melanogaster and D. triauraria. We created an orthology map between the genome assemblies for these two species using Mercator [46] and identified a total of 991 synteny blocks with average size of �117 Kb in D. melanogaster and �140 Kb in D. triauraria. The larger size of the synteny blocks in D. triauraria is consistent with the larger genome size for this species. We visualized synteny by using the promer tool from the MUMmer pipeline [47] to produce a dotplot (Fig 2C), which shows that there have been extensive chromosomal rearrangements since the divergence of these two species, with the majority of rearrangements occurring within Muller elements, as has been found for other Drosophila species [48].
Errors in the genome assembly would affect our synteny analysis because misassembled regions would be interpreted as chromosomal rearrangements. To assess our confidence that the D. triauraria genomic regions that overlap synteny breakpoints are assembled correctly, Contact maps show frequencies of pairwise 3D contacts, inferred from Hi-C data. Darker colors represent higher contact frequencies. Contact frequencies were visualized using Juicebox [39]. C) MUMmer dotplot depicting chromosomal rearrangements between D. melanogaster and D. triauraria. The promer utility from MUMmer [47] was used to visualize synteny between D. melanogaster and D. triauraria. Each dot corresponds to a one-to-one alignment between the two genomes. Red dots represent + /+ strand alignments and blue dots represent +/− strand alignments. we aligned the raw nanopore reads back to the assembly and identified reads that spanned synteny breaks by at least 1 kb on either side. If the synteny breaks are due to misassembly, there should be few, if any, raw sequencing reads that span these breaks. Instead, we find that each synteny breakpoint is spanned by an average of 23 nanopore reads, which is similar to the coverage of randomly selected genomic regions (shuffled breakpoint average: 22.95). Only 0.4% of breakpoints (4 out of 963) are spanned by less than 5 reads (S3 Fig).

TAD boundary and domain annotation
In order to determine how the large number of chromosomal rearrangements present between these two species has affected 3D genome organization, we identified TAD boundaries as well as complete contact domains (i.e. TADs) in both species. We used HiCExplorer [10], which was developed using Drosophila Hi-C data, to identify TAD boundaries at 5 kb resolution for both D. melanogaster and D. triauraria. The total number of Hi-C read pairs for each dataset are reported in S2 Table. HiCExplorer calculates the TAD separation score for each bin in the genome and identifies TAD boundaries as those bins whose score shows significantly larger contact insulation compared to neighboring bins. We used a bin size of 5 kb and found that the TAD separation scores were highly correlated between replicate datasets for each species We also found that the majority of predicted boundaries were identified in each replicate independently (74% [D. melanogaster], 70% [D. triauraria]). We refer to the boundaries that were identified in both replicates as high confidence boundaries, and those identified in only one of the two replicates as low confidence boundaries. In total, we identified 701 and 843 high confidence TAD boundaries for D. melanogaster and D. triauraria, respectively, and 249 and 355 low confidence boundaries (S3 Table).
HiCExplorer [10] links TAD inter-boundary regions together into contact domains. Similar to our approach with boundary elements, we identified contact domains that were found independently in both replicate datasets as high confidence domains and those found only in one replicate as low confidence domains. In total, we identified 552 and 639 high confidence TAD domains for D. melanogaster and D. triauraria, respectively, and 593 and 811 low confidence domains (S3 Table).

Boundary motif enrichment
In Drosophila, TAD boundaries are highly enriched for motifs recognized by the insulator proteins M1BP and BEAF-32 [10]. To validate boundary calls made by HiCExplorer [10], we used Homer [49] software to search the identified boundaries for enriched sequence motifs.

Domain and boundary conservation
We assessed the evolutionary conservation of TAD boundaries between D. melanogaster and D. triauraria by lifting over the high confidence D. melanogaster boundary coordinates to the D. triauraria genome coordinates. We created a whole-genome alignment between the two genome assemblies using Cactus [50] and performed the coordinate liftovers using the halLiftover [51] utility. We considered boundaries to be orthologous when high confidence boundary regions lifted-over from D. melanogaster to D. triauraria overlapped either a high or low confidence boundary that was independently identified in D. triauraria. Out of a total of 701 boundaries identified in D. melanogaster, 654 were successfully lifted-over to a corresponding region in D. triauraria. Of the lifted-over boundaries, 473 (�72%) are orthologous between the two species and 181 (�28%) are melanogaster-specific (Table 1). Our results suggest that the majority of TAD boundaries have been conserved since the divergence of these two species �15 Mya.
We next sought to determine whether TADs themselves showed a similar degree of conservation in spite of the large number of chromosomal rearrangements between these species. TADs would remain conserved if chromosomal rearrangements occur in such a way that TADs are shuffled as intact units (see Model 1, Fig 1). Alternatively, it is possible that the sequence motifs that specify boundaries remain conserved while chromosomal rearrangements shuffle these sequence elements in ways that lead to widespread TAD reorganization (see Model 2, Fig 1). Finally, both chromosomal rearrangements and boundary gain and loss could contribute to TAD evolution (see Model 3, Fig 1). To differentiate between these possibilities, we identified orthologous contact domains between these two species. Similar to our approach with boundaries, we considered contact domains to be orthologous when high confidence domain regions from D. melanogaster lifted-over as a continuous block (allowing for internal rearrangements) to D. triauraria and overlapped either a high or low confidence TAD domain that was independently identified in D. triauraria. We required that the domains were reciprocally overlapping by at least 90% of their lengths.
Out of a total of 552 domains identified in D. melanogaster, 544 were successfully liftedover to a corresponding region in D. triauraria. Of the lifted-over domains, we found that 134 (25%) are orthologous between the two species, whereas 410 (75%) of the D. melanogaster TADs do not show a one-to-one relationship with a D. triauraria TAD (Table 1). Of the nonorthologous TADs, 41% (224/544) ( Table 1), are due to cases where TADs have been split by chromosomal rearrangements (i.e. a contiguous D. melanogaster domain lifts over to multiple, discontiguous regions in D. triauraria) (Fig 3B). Of the orthologous domains, 84 (�63%) also shared orthologous boundary regions.
The Drosophila X chromosome has previously been shown to accumulate chromosomal rearrangements at a faster rate compared to the autosomes [33]. We found a similar pattern for D. melanogaster and D. triauraria, where the median size of synteny blocks is significantly lower for the X chromosome compared to the autosomes (S6 Fig, Wilcoxon test p = 1.35e-12). We also found that the proportion of orthologous TADs on the X chromosome is reduced relative to the autosomes (S6 Fig, Fisher's Exact Test p = 0.014), consistent with increased structural divergence of the X chromosome leading to increased TAD reorganization. Table 1. Summary of results from D. melanogaster to D. triauraria liftover analysis. Percentages of "Unique liftedover to D. triauraria" represent number out of total boundaries or domains. Percentages of orthologous, non-orthologous, and subcategories of boundaries and domains represent number out of "Unique lifted-over to D. triauraria". Truncated and expanded domains do not meet the 90% reciprocal overlap criterion due to large insertion/deletion mutations that have created asymmetry in TAD size between the two species. Missing boundaries and domains are those that could not be lifted-over to D. triauraria.

Category
Boundaries Domains The large disparity between the fraction of orthologous boundaries versus orthologous TADs could simply be due to the fact that TADs present a much larger mutational target and will therefore be more likely to contain a rearrangement breakpoint compared to the boundary regions. To determine whether this was the case, we classified TADs and boundaries based on whether they are located in genomic regions that are co-linear between the two species versus regions that have been interrupted by a chromosomal rearrangement. We found that 42% of TADs and 84% of TAD boundaries are co-linear, which is very similar to the numbers expected by chance (TADs: 230 versus 238 expected, Boundaries: 586 versus 587 expected). However, co-linear genomic regions will not contain orthologous TADs if there are speciesspecific TAD boundaries in these regions. Boundaries could be gained or lost in co-linear regions via short indel or point mutations that create or remove insulator binding sites while still maintaining homology between the two species. Indeed, we find that slightly more than half (58%) of TADs within co-linear genomic regions are orthologous (defined by reciprocal overlap of at least 90% of their length) while the co-linear TADs that are non-orthologous have either been altered by lineage-specific TAD boundaries or contain enough insertions/deletions that they do not meet the 90% reciprocal overlap criterion for orthology ( Table 1).
Given that only 25% of domains are orthologous between the two species, we conclude that both chromosomal rearrangements and boundary gain/loss have reorganized the majority of TADs present in each of these species since their common ancestor, making Model 3 (Fig 1)  the most likely scenario for TAD evolution. For consistency, we repeated these analyses by performing the liftover in the opposite direction, from D. triauraria to D. melanogaster, and obtained similar results (S4 Table).

Gene expression
We hypothesized that TADs rearranged in D. triauraria compared to D. melanogaster might reorganize enhancer-promoter contacts and result in altered gene expression profiles. We performed RNA-seq on replicate datasets for each species and used the DESeq R package [52] to identify differentially-expressed genes between the two species. A total of 964 differentiallyexpressed genes were identified (S7 Fig). We then compared the expression of genes within orthologous and non-orthologous TAD domains between the two species and found that, while nonconserved TADs show a slightly higher percentage of differentially-expressed genes (10.5% versus 9.1%), this difference is not significant (Fig 4, Fisher's Exact Test p = 0.151). These findings suggest that TAD reorganization in Drosophila does not result in widespread changes in gene expression. To determine if changes in TAD structures exert a more localized effect near the site of disruption, we examined differentially-expressed genes within 10 kb of rearrangement breakpoints and lineage-specific TAD boundaries. The fraction of differentially-expressed genes within 10 kb of lineage-specific boundaries is significantly larger than expected by chance (observed: 24 There is evidence that the act of transcription itself plays a role in TAD boundary formation [13] which raises the possibility that the association between non-orthologous boundaries and differentially-expressed genes is due to changes in gene expression in cis, rather than changes in TAD organization. For example, mutations in promoters and/or transcription factor binding sites could cause downregulation of genes near a TAD boundary, which could then cause the boundary itself to weaken or disappear. Under this scenario, differentially-expressed genes near lineage-specific TAD boundaries should show increased expression in the species where the boundary is present and reduced expression in the species where the boundary is absent. We examined D. melanogaster lineage-specific TAD boundaries to determine whether there is A) Genes within orthologous TADs are expressed at significantly reduced levels compared to non-orthologous TADs, consistent with Polycomb-repression (Wilcoxon test p = 6.7e-05) B) Orthologous TADs contain slightly fewer differentially-expressed (DE) genes compared to non-orthologous TADs (9.1% versus 10.5%), however this difference is not statistically significant (Fisher's Exact Test p = 0.151). Differentially-expressed genes were identified using the DESeq2 R software package [52].
https://doi.org/10.1371/journal.pgen.1009229.g004 support for this prediction. We find that there is no difference between the percent of upregulated genes within 10 kb of lineage-specific versus orthologous TAD boundaries (54.5% versus 53.7%, Fisher's Exact Test p = 0.49). These results suggest that differences in transcription are not driving differences in boundary presence/absence.
Even near the endpoints of disrupted TADs, the vast majority of genes (�87%) are expressed at similar levels between the two species, which suggests that the effects of TAD reorganization on gene expression is relatively subtle. To investigate why this would be the case, we compared insulation scores for rearrangement breakpoints located within 5 kb of a TAD boundary to intra-TAD breakpoints located more than 5kb from TAD boundaries as well as the insulation scores of all intra-TAD 5 kb bins. We found that intra-TAD breakpoints tend to occur at regions with increased insulation, compared to all intra-TAD bins (Wilcoxon test p <2.2e-16), but significantly less insulation compared to TAD boundaries (Fig 6A, Wilcoxon  test p = 1.2e-12). We additionally examined lineage-specific boundaries to determine whether novel boundaries tend to evolve at genomic regions with pre-existing insulation activity. We found that the orthologs of lineage-specific boundaries show increased insulation relative to all intra-TAD 5 kb bins (Fig 6B, Wilcoxon test p < 2.2e-16), consistent with a tendency for boundaries to emerge from genomic regions with pre-existing insulating properties. Additionally, we found that compared to the orthologous region in the other species, lineage-specific boundaries have significantly increased insulation (Wilcoxon test p = 4e-8) and significantly  Table, paired Wilcoxon test p = 0.0059), supporting their classification as lineage-specific, and implying that lineage-specific boundaries evolve via the accumulation of insulator protein binding motifs. From these analyses, we conclude that TAD reorganization is associated with changes in gene expression for a subset of genes located near the site of disruption. The subtle effect of TAD reorganization on gene expression may be due, at least in part, to the fact that both rearrangements and lineage-specific boundaries tend to occur at locations that had insulating properties in the common ancestor of the two species.

Chromatin state
Given that TAD locations are correlated with the epigenetic state of chromatin, we next sought to determine whether the properties of TADs differ depending on their chromatin state. We first compared chromatin states between genes in orthologous and non-orthologous TADs. We quantified the number of genes in each of the five chromatin states described by Filion et al. [53] within orthologous and non-orthologous TAD regions (S6 Table). Orthologous TADs show significant enrichment of the BLACK (transcriptionally silent) and BLUE (Polycomb-repressed) chromatin states and significant depletion of the GREEN (constitutive heterochromatin) and YELLOW (constitutively active) chromatin states, compared to nonorthologous TADs (Fig 7, Fisher' Polycomb-repressed chromatin is bound by Polycomb-group (PcG) proteins which regulate the epigenetic silencing of developmental genes. The BLACK chromatin state also contains developmentally-regulated genes [53]. It is associated with genes showing high tissue-specificity and contains a high density of conserved non-coding elements [53]. Consistent with We analyzed HiCExplorer insulation scores calculated for 5 kb bins across the D. melanogaster and D. triauraria genomes. A) We found that intra-TAD breakpoints tend to occur at genomic regions that have increased insulation relative to other intra-TAD bins: Wilcoxon test p < 2.2e-16, but significantly less insulation compared to TAD boundaries: Wilcoxon test p = 1.2e-12. B) We also examined lineage-specific boundaries and found that the genomic regions orthologous to lineage-specific boundaries show increased insulation relative to all intra-TAD 5 kb bins: Wilcoxon test p < 2.2e-16. However, lineage-specific boundaries have significantly increased insulation compared to the orthologous region in the other species, supporting their classification as lineage-specific: Wilcoxon test p = 4e-8. Note that more negative scores indicate more insulation. Full plot for B in S9 orthologous TADs being enriched for epigenetically silenced developmental genes, we found that genes in orthologous TADs are expressed at significantly lower levels than those in nonorthologous TADs (Fig 4A, Wilcoxon test p = 6.7e-05). We also found that orthologous TADs are enriched for homeobox domain-containing genes (FlyMine protein domain enrichment test, Benjamini-Hochberg corrected p = 0.02) [54] and, in comparison to the genes within non-orthologous TADs, are also highly enriched for genes predicted to be regulated by Polycomb-group proteins (Fisher's Exact Test p = 2.6e-5) [55]. The chromatin state tracks in Fig  3A and 3B also support our findings. The majority of genes in the conserved TADs in Fig 3A are BLACK and BLUE, while the genes within the split TAD in Fig 3B are predominantly YEL-LOW. These results largely mirror the chromatin states of the ancient and highly-conserved contact domains identified by Harmston et. al. [31], which contain clusters of conserved noncoding elements and developmental genes.
We next examined the chromatin states of chromosomal rearrangement breakpoints that disrupt TADs. If chromosomal rearrangements evolve neutrally, the chromatin states of polymorphic rearrangement breakpoints should show the same relative abundance as the chromatin states of breakpoints that differ between species, since both will be determined primarily by mutation rate [56]. Divergence between the chromatin state locations of polymorphic versus fixed breakpoints is likely to be due to the effects of natural selection. For example, chromatin states where gene order is under strong purifying selection should show a paucity of interspecies breakpoints whereas interspecies breakpoints should be elevated in chromatin states that tend to contain beneficial rearrangements.
We compared the chromatin states of polymorphic rearrangement breakpoints identified from long-read sequencing of 14 D. melanogaster strains [57] to the chromatin states of intra-TAD rearrangement breakpoints between D. melanogaster and D. triauraria. The interspecies rearrangement breakpoints are significantly depleted from the BLACK, BLUE, and RED chromatin states, relative to polymorphic breakpoints, whereas there is a large excess of interspecies breakpoints in the YELLOW chromatin state (active euchromatin) (Fig 8A). We also found  we compared the frequency of polymorphic versus fixed intra-TAD rearrangement breakpoints across chromatin states. We found that interspecies rearrangement breakpoints that disrupt TADs are significantly depleted from the BLACK, BLUE, and RED chromatin states, and significantly enriched in the YELLOW chromatin state. These results suggest that rearrangement breakpoints from BLACK, BLUE, and RED states are under purifying selection, while some of the rearrangements in YELLOW states may have been fixed due to positive selection. Asterices � indicate significant differences as calculated by Fisher's Exact Test (p-values: BLACK = 5.118e-17; BLUE = 9.714e-5; GREEN = 0.136; YELLOW = 2.352e-29; RED = 0.005). B) Y-axis represents fraction of differentially-expressed (DE) genes assigned to each chromatin state. DE genes from the YELLOW chromatin state are more likely to be located in disrupted (i.e. nonorthologous) TADs compared to conserved (i.e. orthologous) TADs, whereas the opposite is true for genes from the BLUE chromatin state. Asterices � indicate significant differences as calculated by Fisher's Exact Test that, specifically for the YELLOW chromatin state, a higher fraction of genes in non-orthologous TADs are differentially expressed, compared to orthologous TADs (Fig 8B, Fisher's Exact Test p = 0.0059). These results show that interspecies rearrangement breakpoints that disrupt TADs are highly enriched in the YELLOW chromatin state and these disrupted TADs are associated with increased divergence in gene expression.
Genes in the YELLOW chromatin state tend to be broadly expressed across many tissues. To determine whether it is reasonable that this category of genes would be associated with adaptive changes in gene expression, we examined the chromatin states of genes whose divergence in expression level across seven species of Drosophila was previously identified as being due to positive selection [58]. We found that genes from the YELLOW chromatin state are highly enriched in this gene set (observed: 57.5%, expected: 40.1%, Fisher's Exact Test p < 2.2e-16).
An inconsistency in our findings is that intra-TAD rearrangement breakpoints are depleted from the RED chromatin state, similar to the BLUE and BLACK states, but orthologous TADs are not enriched for RED genes (Fig 7), as they are for BLUE and BLACK genes. One explanation for this pattern is that TADs containing RED genes are less likely to be disrupted by chromosomal rearrangements but more likely to be split by lineage-specific boundaries. Consistent with this pattern, we find that lineage-specific boundaries are significantly closer to RED genes than expected by chance (S8 Fig, permutation p < 0.001).
These findings are consistent with a subset of interspecies rearrangements resulting in adaptive changes in gene expression associated with TAD reorganization. On the other hand, the depletion of breakpoints in the BLACK and BLUE chromatin states suggests that such rearrangements may be under purifying selection. The GREEN chromatin state, which is associated with constitutive heterochromatin, shows no difference in frequency between polymorphic and interspecies breakpoints, consistent with neutral evolution.

Discussion
In this study, we sought to examine the evolutionary conservation of 3D genome organization in Drosophila. We selected D. melanogaster and D. triauraria for this comparison because they are separated by �15 million years of evolution [32]. We predicted that this level of divergence would be long enough that large-scale chromosomal rearrangements would have occurred between the two species but short enough that conservation at the nucleotide level would allow for an accurate whole-genome alignment. We used a combination of nanopore and Illumina Hi-C sequencing data to improve a recently published D. triauraria genome assembly produced from relatively low-coverage (depth 18.8x) nanopore sequencing data [34]. We have previously shown that Hi-C data can be used to scaffold Drosophila nanopore contigs with high accuracy, and even correct contig misassemblies [59]. We used our Hi-C data to scaffold the D. triauraria nanopore contigs and our improved D. triauraria assembly resulted in chromosome-length scaffolds highly enriched for genes corresponding to a single Muller element (S1 Fig), further supporting the efficacy of this approach. We were able to align �87% of our D. triauraria assembly to the D. melanogaster reference assembly and we found extensive chromosomal rearrangements (Fig 2C), consistent with our initial prediction that D. triauraria and D. melanogaster represent an ideal species pair for use in a comparative study of 3D genome organization.
Previous research has yielded conflicting results regarding the evolutionary conservation of TAD domains. In theory, TADs should be under strong purifying selection due to their role in preventing aberrant gene-enhancer interactions. Therefore, we expected that entire TAD contact domains, including boundary regions, would be conserved (i.e. Model 1, Fig 1). However, we found that the majority of TADs between D. triauraria and D. melanogaster are non-orthologous due to a combination of boundary elements being shuffled by chromosomal rearrangements and gain/loss of lineage-specific boundaries resulting in reorganization of TAD architecture (see Model 3, Fig 1). Our approach is conservative and likely underestimates the true extent of TAD divergence. Previous studies have identified inconsistencies in TAD-calling software packages [60] and have raised the possibility that TAD conservation results may depend on the direction of the liftover comparison [26]. For example, some studies report conservation estimates by first calling TADs in the species for which they have more data and then identifying the orthologous domains in the species for which they have less data [2,6,25,26]. When reversing the analysis the conservation rate can be reduced by up to 25% [26]. However, in this study, we used biological replicates to demonstrate that the identification of TAD boundaries and TAD units is highly reproducible. We also performed our analysis of TAD conservation in both directions (i.e. from D. melanogaster to D. triauraria and vice versa) and obtained similar results regardless of the direction of comparison. Furthermore, our estimates of conservation, if biased at all, should be biased towards inferring higher levels of conservation. We only considered TADs for our liftover step if they were independently identified in both biological replicates, which should enrich for stronger TADs. After liftover, we considered the TAD to be orthologous if it overlapped either a strong (i.e. high-confidence) TAD or a weak TAD (i.e. low-confidence TAD identified in only a single replicate). We also did not require orthologous TADs to have orthologous boundaries. Instead, they were only required to have a reciprocal overlap of at least 90% of their lengths. We would expect these relatively low-stringency criteria to potentially result in an over-estimate of TAD conservation, yet we still only find �25% of TADs to be orthologous between species.
Broadly, our results are similar to several recent studies suggesting that TADs may actually diverge relatively rapidly and that TAD reorganization is not necessarily associated with widespread divergence in gene expression [26][27][28][29][30]. However, although non-orthologous TADs are not enriched for differentially-expressed genes (Fig 4B), we do find evidence for a localized effect of TAD reorganization on the expression of genes near the point of disruption, for both chromosomal rearrangements and lineage-specific TAD boundaries (Fig 5). This is similar to the effect reported by Ghavi-Helm et. al. [27] where disrupted TADs in the highly-rearranged balancer chromosomes of D. melanogaster showed more differences in gene expression near rearrangement breakpoints, rather than widespread across the entire TAD. Even near the sites of TAD disruption, the vast majority of genes are expressed similarly, both in the balancer chromosome study and in our comparison between D. triauraria and D. melanogaster. Interestingly, we find that both intra-TAD rearrangements and lineage-specific boundaries tend to occur at genomic regions that had insulator-like properties in the ancestor of these two species (Fig 6). These results raise the possibility that physical contacts spanning such regions were limited in the ancestral TAD configuration, which would explain the relatively subtle effect on gene expression that accompanied reorganization of these TADs.
Our results also provide insight into the evolution of TAD boundaries. They suggest that the formation of novel boundaries may involve the accumulation of insulator binding motifs and that lineage-specific boundaries play an important role in TAD reorganization and gene expression, even in the absence of chromosomal rearrangements. The boundaries that we identify as lineage-specific have stronger insulating properties and tend to have more insulator protein binding motifs, compared to their orthologous sequence in the other species (Fig 6B,  S5 Table). In addition, the non-boundary orthologs of lineage-specific boundaries show significantly less insulation than the actual boundaries identified in the same species (Fig 6A). Furthermore, our lineage-specific boundaries are reproducible: in order to be lineage-specific, we required boundaries to be independently identified in each replicate for the species of interest and absent from the orthologous sequence in both replicates of the other species. Nevertheless, it remains possible that a portion of the boundaries we identify as lineage-specific are due to inconsistencies in identification between the two species and are actually cases where there is a difference in boundary strength rather than boundary presence/absence. At the very least, we can conservatively conclude that differences in boundary strength (and potentially presence/ absence) are associated with differences in the abundance of BEAF-32 and M1BP motifs as well as local changes in gene expression.
Several previous studies have concluded that 3D genome organization is conserved and therefore evolving under purifying selection [2,7,23,25]. These studies have based their conclusion of evolutionary constraint on a statistical correlation of contact frequencies between species, without identifying TADs [7] or on a statistical association between synteny breakpoints and TAD boundaries [2,[23][24][25]. Renschler et. al. [25] report that 3D genome architecture is conserved across the Drosophila species D. melanogaster, D. virilis, and D. busckii based on a significant association between synteny breakpoints and TAD boundaries, yet find that only 10% of identified TADs were conserved across all three species.
By separating TADs based on chromatin state, we have gained additional insight into their evolution. We find that a subset of TADs are likely evolving under purifying selection, specifically those enriched for developmentally-regulated genes in the BLUE and BLACK chromatin states defined by [53]. Such genes are known to be involved in long-distance physical interactions with enhancers: 4C-seq experiments in Drosophila have shown that developmental enhancers have a high degree of 3D connectivity and form chromatin loops on the order of tens to hundreds of kilobases in size [22]. Reorganization of developmental TAD subtypes should perturb these long-distance contacts and potentially result in the misregulation of developmental genes, which is likely to be deleterious. Accordingly, we find that conserved TADs are enriched for both BLUE and BLACK chromatin states (Fig 7), which are known to mark developmentally-regulated genes, and synteny breakpoints are depleted from these same states (Fig 8A), consistent with selection acting to preserve the organization of these TADs.
On the other hand, we find that TADs enriched for genes with broad expression patterns (i.e. YELLOW chromatin state) are evolving rapidly. Compared to genes in the BLUE and BLACK chromatin states, these genes have a relatively simple regulatory architecture: they are found in genomic regions that are devoid of conserved non-coding elements, and tend to be highly expressed across a variety of tissues [53]. In contrast to the BLUE and BLACK chromatin states, we find that TADs that have been reorganized since the common ancestor of D. triauraria and D. melanogaster are enriched for genes in the YELLOW chromatin state (Fig 7). We also observe an excess of rearrangement breakpoints in the YELLOW chromatin state for our interspecies comparison versus polymorphic breakpoints from D. melanogaster (Fig 8A). These results suggest that natural selection may actually be favoring reorganization of this subtype of TAD. Unlike the BLACK and BLUE chromatin states, we find that reorganized TADs enriched for YELLOW genes contain a significant excess of differentially-expressed genes, raising the possibility that reorganization of these TADs is associated with adaptive changes in gene expression (Fig 8B). Genes previously identified as experiencing adaptive changes in gene expression in Drosophila [58] are highly enriched for the YELLOW chromatin state, suggesting that adaptive evolution of this gene category may be relatively common.
Finally, we also find evidence for a TAD subtype that evolves neutrally. Conserved TADs show a significant depletion of genes from the GREEN chromatin state (Fig 7), which is associated with constitutive heterochromatin, suggesting a lack of evolutionary constraint. However, unlike the YELLOW chromatin state, we find no difference in the abundance of polymorphic versus interspecies rearrangement breakpoints in the GREEN state (Fig 8A), suggesting that rearrangements that disrupt these TADs are accumulating neutrally. Furthermore, reorganization of these TADs is not associated with divergence in gene expression (Fig 8B). Although there are transcribed genes in the GREEN chromatin state, in general it is gene-poor and repeat-rich [53]. The apparently neutral reorganization of these TADs suggests their function could be related more to the packaging of heterochromatin rather than regulation of gene expression.
The evolutionary dynamics of TADs enriched for genes from the RED chromatin state are less clear. This chromatin state is characterized by active euchromatin, but unlike the YEL-LOW state, genes in the RED state are more likely to have tissue-specific patterns of expression and have a higher density of conserved-noncoding elements [53]. There is a significant depletion of interspecies rearrangements in this chromatin state (Fig 8A), consistent with purifying selection, however, conserved TADs are not enriched for genes from the RED state (Fig 7), possibly because lineage-specific boundaries (which also disrupt TADs) are more likely to form near RED genes. Additionally, in terms of differentially-expressed genes, the RED state is more similar to the YELLOW state, where nonconserved TADs contain a higher fraction of differentially-expressed genes, although the difference is not statistically significant in this case (Fig 8B). It is possible that some TADs containing RED genes are evolving under purifying selection while others are evolving under positive selection.
These results help to reconcile previously contradictory findings regarding TAD evolution. For example, previous comparative studies of TAD organization have either concluded that TADs are highly conserved [2,7,[23][24][25] or rapidly evolving (e.g. [26]). Our results show that both conclusions can be true: certain TAD subtypes, such as those containing developmental genes, are likely to be evolving under purifying selection, whereas other TADs, such as those enriched for the YELLOW chromatin state, are evolving rapidly. Additionally, some studies have found that TAD disruption results in aberrant enhancer-promoter contacts and gene misregulation [16,61] while others have found little to no association between TAD reorganization and differential gene expression [27,30]. Our results suggest that there are different subtypes of TADs with variable tolerances for disruption. Disruption of some types of TADs, such as those containing developmental genes surrounded by clusters of conserved non-coding regulatory elements (CNEs), may cause widespread alterations of gene expression profiles and these TADs are therefore highly conserved. Other types of TADs, such as those containing fewer CNEs and long-distance enhancer-promoter contacts, could potentially be altered without widespread effects on gene expression. These TADs may be more likely to diverge quickly between species. If this is true, previous studies reporting contradictory effects of TAD rearrangement on gene expression may simply be due to the differences in the subtypes of TAD being tested. Future work involving experimental disruption of an unbiased sample of TADs would allow for testing of this prediction.

D. triauraria genome sequencing
Using the Qiagen DNAeasy Blood and Tissue Kit, we extracted DNA from �30 D. triauraria mated adult females strain 14028-0651.00 (National Drosophila Species Stock Center at Cornell). We used the Oxford Nanopore Technologies (ONT) SQK-LSK 108 library preparation kit to construct three PCR-free libraries according to the ONT 1D Genomic DNA by Ligation protocol. Each library was sequenced on a MinION r9.4 flow cell. Raw signal data were basecalled using the ONT Albacore v2.3.4 software package with default parameters.
Hi-C chromosome conformation capture D. triauraria and D. melanogaster strains were maintained in population cages on molasses agar with yeast paste. Embryos (8-16 h) for each species were collected and dechorionated in 50% commercial bleach for 2.5 min. Nuclei were isolated from �1 g of embryos and fixed in 1.8% formaldehyde for 15 minutes according to the protocol in Sandmann et. al. [62]. Two replicate Hi-C libraries were constructed for each species using the in situ DNase Hi-C protocol described by Ramani et. al. [63]. Libraries were sequenced on an Illumina HiSeq 2500 machine.

RNA-seq
Approximately 0.02 g fresh embryos were collected using the same approach as for Hi-C libraries and homogenized in 300 μL 1x DNA/RNA Shield. Fifty pairs of testes were dissected from 3-5-day old mated males and 10 pairs of ovaries from 3-5-day old mated females. Dissections were performed in 1X PBS and then immediately transferred into 200 μL RNAlater solution. Two hundred μL of ice-cold 1X PBS was added to each sample and they were centrifuged at 5000g for 1 min at 4˚C. After removing the supernatant, 300 μL 1x DNA/RNA shield was added and samples were homogenized immediately using an electric pestle.
RNA was extracted using the Quick-RNA Plus Kit (R1057) from Zymo Research. Samples were incubated at 55˚C with 30 μL PK Digestion Buffer and 15 μL Proteinase K for at least 30 minutes. Column-based size selection was used to obtain >200 nt purified total RNA. MRNAseq libraries were constructed from the total RNA using the NEBNext Poly(A) mRNA Magnetic Isolation Module (E7049) and the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (E7760) using 1 μg total RNA with fragmentation for 7 min at 94˚C and first strand cDNA synthesis via incubation for 45 minutes at 42˚C. Library quality was assessed on a Bioanalyzer. For the embryo samples, two biological replicate libraries were prepared for D. triauraria strain 14028-0651.00 (National Drosophila Species Stock Center at Cornell) while D. melanogaster strains RAL-379 and RAL-732 from the Drosophila Genetic Reference Panel (DGRP) [64] were used as biological replicates for D. melanogaster.

D. triauraria genome assembly and annotation
The basecalled D. triauraria nanopore reads were combined with the nanopore reads from Miller et. al. [34] and assembled using Canu [35]. Purge Haplotigs [38] was used to identify and collapse heterozygous contigs where each haplotype was assembled separately. The assembly was then polished using the raw nanopore signal data with Nanopolish [36]. Finally, uninformative Hi-C Illumina sequences (i.e. those that do not contain a ligation junction) were used as single-end reads to further polish the assembly with Pilon [37]. To confirm that our data were from the same D. triauraria strain as the data generated by Miller et. al. [34], we aligned our uninformative D. triauraria Hi-C reads and the D. triauraria Illumina data from Miller et. al. (SRA project PRJNA473618) [34] as single-end reads using bowtie2 (version 2.2.9) [65] with default parameters. We then called SNPs using Freebayes (version 1.2.0) [44] with the parameters -no-indels -no-mnps -no-complex -0 -report-monomorphic -use-best-nalleles 4.
The Juicer [39] and 3D-DNA [40] pipelines were used to scaffold the D. triauraria nanopore reads with Hi-C sequencing data. The Juicebox software package [66] was used to visualize contact matrices, assign chromosome boundaries, and export a finalized reference sequence for downstream analysis.
To assign the chromosome-length D. triauraria scaffolds to their corresponding Muller element (i.e. Muller A-F), we performed a translated BLAST search of our scaffolds using FlyBase r6.21 [67] D. melanogaster peptides as queries (Fig 2A and 2B).
To pre-process the RNA-seq data for MAKER [43], we first aligned the D. triauraria RNAseq Illumina reads to the newly assembled D. triauraria reference genome using HISAT2 [68].
Second, the HISAT2 alignments were used to assemble mRNA transcripts with stringtie [69]. The stringtie transcripts were provided to MAKER along with D. melanogaster r6.26 peptides from FlyBase [67]. The MAKER control file is available via GitHub.

Genome synteny
We identified synteny blocks between D. melanogaster and D. triauraria using Mercator [46] software. We visualized synteny using the promer tools from the MUMmer [47] pipeline to produce a dotplot comparison of the D. melanogaster and D. triauraria genomes.

Identifying TAD boundaries and domains
We removed adapter sequences from Hi-C reads for each species using Trimmomatic [70] and used a custom perl script to split reads that contain a ligation junction. We used BWA software [71] to align the split forward and reverse Hi-C reads to each species' reference assembly (the D.triauraria assembly generated in this study and the D. melanogaster release 6 assembly from Flybase [67]). We used HiCExplorer version 2.2 [10] to create a normalized contact frequency matrix. To find TAD boundaries and domains for each species we ran the hicFindTads utility separately for each biological replicate. We used Bedtools [72] to identify overlapping boundaries and domains between replicates. Boundaries were required to overlap by at least one base pair in both replicates. Domains were required to have start and end coordinates within 5000 bp in both replicates. Boundaries and domains identified in both replicates were considered high confidence while those identified in one replicate are low confidence. We used a custom python script to calculate correlation coefficients between replicates for the TAD separation scores. Boundary and intra-TAD insulation scores were calculated for 5 kb bins using HiCExplorer.

Defining and identifying orthologous TAD boundaries between D. melanogaster and D. triauraria
We softmasked the D. melanogaster and D. triauraria genomes using Repeatmasker [73] and aligned them using Cactus [50] to generate a hal file. We input the high confidence boundaries for D. melanogaster to halLiftover [51] to identify the corresponding genomic coordinates in D. triauraria. HalLiftover reports contiguous liftover coordinates as separate features if they include short indels. We therefore merged 'lifted-over' boundary locations that were within 5000 bp of each other into a single feature. Lifted-over boundaries less than 500 bp in size were excluded from further analysis. We considered lifted-over boundaries from D. triauraria that were located less than 5 kb from either a high or low confidence boundary in D. melanogaster to be orthologous boundaries. Lifted-over boundaries from D. triauraria that were not identified as boundaries in D. melanogaster were considered non-orthologous. We implemented the same pipeline for the reverse comparison, from D. triauraria to D. melanogaster. We defined a lineage-specific boundary as a high-confidence boundary present in one species whose orthologous sequence in the other species did not overlap either a high-or low-confidence boundary.

Boundary motif enrichment
We used Homer [49] software to search for enriched motifs with the high-confidence boundary sequences for both D. triauraria and D. melanogaster. We split each genome assembly into 5 kb sequences for use as the background dataset. To compare motif occurrences between lineage-specific boundaries and their orthologs, we downloaded motif matrix profiles from JASPAR [74] for BEAF-32 (accession MA0529.2) and M1BP (accession MA1459.1) and used FIMO [75] to identify occurrences of each motif within lineage-specific boundaries as well as their orthologs. We summed the number of nonoverlapping motif occurrences within each boundary and compared the number of motifs in each lineage-specific boundary to the number of occurrences of the motif in the orthologous non-boundary sequence.

Defining and identifying orthologous domains between D. melanogaster and D. triauraria
To assess domain conservation between D. melanogaster and D. triauraria we used halLiftover [51]. HalLiftover will report lifted-over coordinates as separate features if there are species-specific indels, transposon insertions, or chromosomal rearrangements. In order to combine contiguously lifted-over segments that were separated by species-specific indels, TE insertions, or intra-TAD rearrangements, we merged lifted-over features separated by less than 20 kb. Lifted-over features less than 5000 bp were excluded from further analysis. After merging, the lifted-over domains in D. triauraria that reciprocally overlapped a D. melanogaster high or low confidence domain (>90%) were considered orthologous domains.
Lifted-over domains in D. triauraria that did not meet the reciprocal overlap criteria with a D. melanogaster domain were considered non-orthologous. To identify orthologous domains between the two species that also share boundaries, we required the D. triauraria lifted-over endpoints to lie within 5 kb of the D. melanogaster orthologous TAD boundaries. Non-orthologous domains were categorized into truncated/expanded, or split domains. Truncated and expanded domains are not split by chromosomal rearrangements or lineage-specific boundaries, rather, they do not meet the 90% reciprocal overlap criterion due to large insertion/ deletion mutations that have created asymmetry in TAD size between the two species. Split domains included those split by rearrangement and lineage-specific boundaries. We implemented the same pipeline for the reverse comparison, from D. triauraria to D. melanogaster.
To determine the number of co-linear TADs and boundaries expected by chance, we used Bedtools to shuffle their locations and determine how many of the shuffled features were located entirely within a synteny block. We then calculated the average number of co-linear TADs/ boundaries across 100 shuffles.

Gene expression
Stranded embryo RNA-seq data were aligned to their respective genomes using HISAT2 (version 2.1.0) [68] with parameters -dta -max-intronlen 50000 -rna-strandness RF. Per-gene raw read counts were generated using htseq-count (version 0.11.2) [76] with parameters -i Parent -f bam -r pos -s reverse -a 20 -nonunique none. Our MAKER [43] gene models were used for D. triauraria and the FlyBase r6.21 [67] gene models were used for D. melanogaster. One-to-one gene orthologs were identified using our Mercator [46] orthology map and differentiallyexpressed genes were identified using the DESeq2 R software package [52].

Chromatin state
We used the chromatin state annotations from Filion et. al. [53] to assign each D. melanogaster gene to one of five chromatin states (BLACK, BLUE, GREEN, RED, YELLOW). Genes were assigned to chromatin states based on the state that covered the largest proportion of the gene (including introns) and we counted the number of genes from orthologous versus non-orthologous TADs for each of the five chromatin states. To determine whether genes from the RED chromatin state tend to be near lineage-specific boundaries, we used Bedtools to determine the number of genes from the RED chromatin state that overlapped a lineage-specific boundary.
We then used Bedtools to shuffle the lineage-specific boundary locations and counted the number of genes from the RED chromatin state that overlapped a shuffled boundary. We performed 1000 shuffles in total.

Rearrangement breakpoints
We used Mercator to identify synteny breakpoints between D. melanogaster and D. triauraria. To compare interspecies breakpoint locations to intraspecies rearrangements, we used the locations of polymorphic D. melanogaster inversions that were 10 kb or larger, identified from long-read sequencing of 14 D. melanogaster strains [57].

D. triauraria polymorphic inversion breakpoints
We visually identified polymorphic inversion breakpoints using the Juicebox software package and estimated coordinate. Polymorphic inversions are evident through "bow-tie" like contact points accompanied by high contact frequencies along the diagonal at the breakpoint. To confirm that polymorphic inversion breakpoints did not disrupt TAD boundaries we intersected the coordinates with high confidence D. triauraria boundaries and calculated the number of breakpoints that intersected boundaries and the median distance of breakpoints from boundaries. Additionally, we ran 1000 permutations of shuffled inversion breakpoints to compare expected number of breakpoint/boundary intersections and median distance from TAD boundary to the observed values. The size of D. melanogaster synteny blocks is significantly reduced on the X chromosome (Muller A) relative to the autosomes, indicating that the X has accumulated more chromosomal rearrangements (Wilcoxon test p = 6.7e-05). (B) The proportion of orthologous lifted-over TAD domains is also significantly reduced on the X chromosome (Fisher's Exact Test p = 0.0135). (TIF) S7 Fig. MA plot highlighting differentially-expressed genes between D. triauraria and D. melanogaster. Differentially-expressed genes (adjusted p-value <= 0.05) between D. melanogaster and D. triauraria identified by DEseq2 [52] are shown in red. Each point represents an orthologous gene pair between the two species. The plot was created using the DEseq2 shrunken log2 fold changes which removes noise from low count genes.  Table. Combined D. triauraria sequencing and assembly data. This study and Miller et al. [34]. The decrease in assembly size after scaffolding is mainly due to the removal of contigs by Purge haplotigs [38].