A shared ancient enhancer element differentially regulates the bric-a-brac tandem gene duplicates in the developing Drosophila leg

Gene duplications and transcriptional enhancer emergence/modifications are thought having greatly contributed to phenotypic innovations during animal evolution. Nevertheless, little is known about how enhancers evolve after gene duplication and how regulatory information is rewired between duplicated genes. The Drosophila melanogaster bric-a-brac (bab) complex, comprising the tandem paralogous genes bab1 and bab2, provides a paradigm to address these issues. We previously characterized an intergenic enhancer (named LAE) regulating bab2 expression in the developing legs. We show here that bab2 regulators binding directly the LAE also govern bab1 expression in tarsal cells. LAE excision by CRISPR/Cas9-mediated genome editing reveals that this enhancer appears involved but not strictly required for bab1 and bab2 co-expression in leg tissues. Instead, the LAE enhancer is critical for paralog-specific bab2 expression along the proximo-distal leg axis. Chromatin features and phenotypic rescue experiments indicate that LAE functions partly redundantly with leg-specific regulatory information overlapping the bab1 transcription unit. Phylogenomics analyses indicate that (i) the bab complex originates from duplication of an ancestral singleton gene early on within the Cyclorrhapha dipteran sublineage, and (ii) LAE sequences have been evolutionarily-fixed early on within the Brachycera suborder thus predating the gene duplication event. This work provides new insights on enhancers, particularly about their emergence, maintenance and functional diversification during evolution.


Introduction
Gene duplications have largely contributed to create genetic novelties during evolution [1,2]. Intra-species gene duplicates are referred to as "paralogs", which eventually diverged functionally during evolution in a phylogenetic manner. Gene family expansion has facilitated phenotypic innovation through (i) acquisition of new molecular functions or (ii) the subdivision of the parental gene function between the duplicate copies [3][4][5]. Phenotypic novelties are thought having originated from both modifications of protein sequences and evolutionary emergence or modifications of genomic Cis-Regulatory Elements (CREs) or modules, most often dubbed as "enhancer" regions, which regulate gene transcription in a stage-, tissue-and/ or cell-type-specific manner [6][7][8][9][10]. While many shared CRE/enhancers have been described in Drosophila for several gene complexes [11][12][13][14], how they emerge and are differentially evolving remain largely elusive.
The~150-kilobase (kb) long Drosophila melanogaster bric-a-brac (bab) locus, located on the third chromosome (3L arm), comprises two tandemly-duplicated genes (Fig 1A), bab1 and bab2, which encode paralogous transcription factors sharing two conserved domains: (i) a Bric-a-brac/Tramtrack/Broad-complex (BTB) domain involved in protein-protein interactions, and (ii) a specific DNA-binding domain (referred to as BabCD, for Bab Conserved Domain), in their amino(N)-and carboxyl(C)-terminal moieties, respectively [15]. Bab1-2 proteins are co-expressed in many tissues [15,16]. In the larval epidermis, they co-regulate directly yellow expression in a sexually-dimorphic manner, thus controlling adult male versus female body pigmentation traits [17][18][19][20]. bab1-2 co-expression in the developing epidermis is partially governed by two CREs which drive reporter gene expression (i) in a monomorphic pattern in the abdominal segments A2-A5 of both sexes (termed AE, for "Anterior Element"), and (ii) in a female-specific pattern in the A5-A7 segments (DE, for "Dimorphic Element") ( Fig 1A) [18,21]. In addition to controlling male-specific abdominal pigmentation traits, bab1-2 are required, singly, jointly or in a partially-redundant manner, for embryonic cardiac Funding: Research in the HMB-MB laboratory was supported by grants from the Association pour la Recherche sur le Cancer (ARC PJA 20141201932) to MB, the Agence Nationale de la Recherche (ANR-16 CE12-0021-01) to MB, and institutional basic support from of the Centre National de Recherche Scientifique (CNRS) to HMB and Toulouse III University to HMB. Research in the G. C. laboratory was supported by a grant from the European Research Council (Advanced Grant 3DEpi) and by the CNRS (for GC and FB). MHB obtained a PhD fellowship from the French « Ministère de L'Enseignement Supérieur et de la Recherche », LHMV from the Mexican CONACYT and AB from a CNRS-Conseil Régional Midi-Pyrénées co-financing and then from the ARC. V.L. was supported by a doctoral fellowship from the Laboratory of Excellence EpiGenMed and the ARC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. (E) Medial confocal view of a mosaic L3 leg disc expressing LAE-GFP ZH2A and harboring rotund mutant clones. Merged Bab1 (cyan) immunostaining, GFP (green) and RFP (red) fluorescence, as well as each marker in isolation in (E'), (E") and (E"'), respectively, are shown. Mutant clones are detected as black areas, owing to the loss of RFP. The respective ts1-5 fields are indicated in (E). White arrows indicate rotund-/-clones still expressing bab1 and yellow ones those that do not express bab1. (F) Distal confocal view of a mosaic L3 leg disc expressing LAE-RFP ZH2A and harboring bowl mutant clones (GFP-). Merged Bab1 (cyan) immunostaining, RFP (red) and GFP (green) fluorescence, as well as a higher magnification of the boxed area for each marker in isolation in (F'), (F") and (F"'), respectively, are shown. Mutant clones are detected as black areas, owing to the loss of GFP. White arrows indicate pretarsal bowl-/-clones ectopically expressing both bab1 and LAE-RFP ZH2A (bab2). https://doi.org/10.1371/journal.pgen.1010083.g001
Adult T1-3 legs, on the pro-, meso-and meta-thoraces, respectively, are derived from distinct mono-layered epithelial cell sheets, organized as sac-like structures, called leg imaginal discs (hereafter simply referred to as leg discs) [30][31][32]. Upon completion of the third-instar larval stage (L3), each leg disc is already patterned along the proximo-distal (P-D) axis through regionalized expression of the Distal-less (Dll), Dachshund (Dac) and Homothorax (Hth) transcriptional regulators in the distal (center of the disc), medial and proximal (peripheral) regions, respectively [30]. The five tarsal (ts1-5) and the single pretarsal (distalmost) segments are patterned through genetic cascades mobilizing transcription factors, notably the distal selector protein Dll and the tarsal Rotund protein as well as nuclear effectors of Notch and Epidermal Growth Factor Receptor (EGFR) signaling, i.e., Bowl and C15, respectively [30,31].
Whereas both bab genes are required for dimorphic abdominal pigmentation traits and somatic gonad specification [17,26], only bab2 is critical for tarsal segmentation [15]. While bab1 loss-of-function legs are apparently wild-type, a protein null allele (bab AR07 ) removing bab2 (in addition to bab1) gene activity causes shortened legs owing to ts2-5 tarsal fusions as well as P-D homeotic transformations as seen by the appearance of a few up to several ectopic sex comb teeth in ts4, ts3 and ts2 segments, respectively, in males [15]. While the two bab genes are co-expressed within ts1-4 cells, bab2 is expressed more proximally than bab1 in ts1, and in a graded manner along the P-D leg axis in ts5 [15]. We previously showed that bab2 expression in distal leg (and antennal) tissues is governed by a 567-basepair (bp) long CRE/ enhancer (termed LAE for "Leg and Antennal Enhancer") which is located in between the bab1-2 transcription units ( Fig 1A) [21,29]. However, LAE enhancer contribution to bab1 versus bab2 regulation in the developing distal legs remains to be investigated.
Here, we show that bab1 expression in the developing distal leg depends on the Rotund, Bowl and C15 proteins, three transcription factors known to regulate directly bab2 expression, by binding to dedicated LAE sequences [21,29]. LAE excision by CRISPR/Cas9-mediated genome editing indicates that this enhancer is required but not sufficient for both bab1 and bab2 regulation and, more unexpectedly, is required also for their differential expression along the P-D leg axis. Phylogenomics analyses indicate that LAE sequences have been fixed early on during dipteran evolution, well before emergence of the bab complex in the Cyclorrhapha sublineage. This work illuminates how a transcriptional enhancer from tandem gene duplicates underwent evolutionary changes to diversify their respective tissue-specific gene expression pattern.

Results
The tandem bab1-2 gene paralogs are co-regulated in the developing distal leg In addition to the distal selector homeodomain (HD) protein Distal-less, we and others have previously shown that the C15 HD protein (homeoprotein) as well as Rotund and Bowl Zinc-Finger (ZF) transcription factors (TFs) bind dedicated sequences within LAE to ensure precise bab2 expression in four concentric tarsal rings within the leg discs ( Fig 1B) [21,29]. bab1-2 are co-expressed in ts2-4 tarsal segments, while bab2 is specifically expressed in ts5 and more proximally than bab1 in ts1, both in a graded manner along the P-D leg axis (Figs 1C and S1A) [15]. Given bab1-2 co-expression in ts1-4, we first asked whether C15, rotund and bowl activities are also controlling bab1 expression in the developing distal leg. To this end, we compared Bab1 expression with that of X-linked reporter genes faithfully reproducing the bab2 expression pattern there [21,29], in homozygous mutant leg discs for a null C15 allele or in genetically-mosaic leg discs harboring rotund or bowl loss-of-function mutant cells (Fig 1D-1F).
C15 is specifically activated in the distalmost (center) part of the leg disc giving rise to the pretarsal (pt) segment (see Fig 1B) [33,34]. We have previously shown that the C15 homeoprotein down-regulates directly bab2 to restrict its initially broad distal expression to the tarsal segments [29]. Bab1 expression analysis in a homozygous C15 mutant leg disc revealed that both bab1 and LAE-RFP ZH2A (bab2) are similarly de-repressed in the pretarsus (Fig 1C and  1D).
In contrast to C15, rotund expression is restricted to the developing tarsal segments [35] and the transiently-expressed Rotund ZF protein contributes directly to bab2 up-regulation in proximal (ts1-2) but has no functional implication in distal (ts3-5) tarsal cells [21]. Immunostaining of genetically-mosaic leg discs at the L3 stage revealed that bab1 is cell-autonomously down-regulated in large rotund mutant clones in ts1-2, but not in ts3-4 segments (Fig 1E), as it is the case for LAE-GFP ZH2A reflecting bab2 expression. Lastly, we examined whether the Bowl ZF protein, a repressive TF active in pretarsal but not in most tarsal cells, is down-regulating bab1 expression there [36], like bab2 [29]. Both bab1 and LAE-RFP ZH2A (bab2) appeared cellautonomously de-repressed in bowl loss-of-function pretarsal clones ( Fig 1F).
In addition to loss-of-function, we also conducted gain-of-function experiments for bowl and rotund. Given Bowl TF instability when overexpressed, bowl gain-of-function has been achieved by down-regulating lines which (i) encodes a related but antagonistic ZF protein destabilizing nuclear Bowl and (ii) is specifically expressed in the tarsal territory [36]. As previously shown for LAE-GFP ZH2A (and bab2) expression, nuclear Bowl stabilization in the developing tarsal region appears sufficient to down-regulate cell-autonomously bab1 (S1C Fig). Prolonged expression of the Rotund protein in the entire distal part of the developing leg disc, i.e., tarsal in addition to pretarsal primordia, induces ectopic bab1 expression in the presumptive pretarsal territory, as previously shown for bab2 albeit with some differences in proximalmost GFP+ cells (S1B Fig, differentially-expressing cells are indicated with arrows), thus suggesting differential sensitivity of the two gene duplicates to Rotund TF levels (see discussion).
Taken together, these data indicate that the C15, Bowl and Rotund transcription factors, previously shown to interact physically with specific LAE sequences and thus to regulate directly bab2 expression in the developing distal leg, are also controlling bab1 expression there. These results suggest that the limb-specific intergenic LAE enhancer activity regulates directly both bab genes.

LAE activity regulates both bab1 and bab2 paralogs along the proximodistal leg axis
To test the role of LAE in regulating both bab1 and bab2, we deleted precisely the LAE sequence through CRISPR/Cas9-mediated genome editing (see Materials and Methods) (Fig  2A). Two independent 3L chromosomal deletion events (termed ΔLAE-M1 and -M2; see S2A Fig for deleted DNA sequences) were selected for phenotypic analysis. Both deletion mutants are homozygous viable and give rise to fertile adults with identical fully-penetrant distal leg phenotypes, namely ectopic sex-comb teeth on ts2 (normally only found on ts1) tarsal segment in the male prothoracic (T1) legs (Fig 2B), which are typical of bab2 hypomorphic alleles [15].

PLOS GENETICS
Enhancer rewiring within the Drosophila bric-a-brac gene complex The ΔLAE-M1 allele was selected for detailed phenotypic analyses and is below referred to as bab ΔLAE .
First, we quantified bab1 and bab2 mRNAs prepared from dissected wild-type and homozygous bab ΔLAE mutant leg discs. As shown in S2B Fig, both mRNAs were detected in mutant discs, although bab1 levels were two times lower than wild-type. Second, Bab1-2 expression patterns were analyzed in homozygous bab ΔLAE leg discs. To identify leg cells that should normally express bab2, we used the LAE-GFP ZH2A reporter. In homozygous bab ΔLAE mutant leg discs, bab2-specific expression in proximalmost ts1 and ts5 cells (see Fig 1C) is no longer observed (Fig 2C and 2D). Furthermore, shared expression of both gene duplicates in distalmost ts1 cells is no longer detectable in bab ΔLAE mutant discs. Nevertheless, maintenance of bab1-2 co-expression in ts2-4 mutant cells indicates that additional cis-regulatory region(s) acting redundantly with the LAE enhancer must be present within the bab locus on the third chromosome. To exclude possible "transvection" effects of the X-linked LAE-GFP ZH2A construct across different chromosomes [37], we also examined Bab1-2 expression patterns in homozygous bab ΔLAE leg discs in the absence of the LAE-GFP ZH2A reporter. As shown in Fig  2E, in the homozygous bab ΔLAE mutant both bab genes are only (co-)expressed in ts2-4 cells and bab2 remains no longer specifically expressed in ts1 an ts5 cells, ruling out a trans-chromosomal effect of the LAE-GFP ZH2A transgene.
Taken together, our data indicate that intergenic LAE enhancer activity regulates both bab gene duplicates, being (i) required for bab1-2 co-expression in distal ts1, (ii) dispensable for their co-expression in ts2-4, suggesting the presence of redundant cis-regulatory information and (iii) critically required for bab2-specific tarsal expression both proximally and distally (in ts1 and ts5, respectively). Thus, the LAE enhancer governs both shared and paralog-specific expression of the bab1-2 gene duplicates.

Chromatin features predict limb-specific cis-regulatory elements within bab1
Since LAE appeared dispensable for bab1 and bab2 co-expression in ts2-4 cells, our data suggested the existence of other redundant cis-regulatory elements. We sought to identify cis-regulatory information acting redundantly with LAE by taking advantage of available genomewide chromatin features and High-throughput chromosome conformation Capture (Hi-C) experiments performed from L3 leg or eye-antennal discs (Fig 3). bab1 and bab2 are indeed co-expressed in distal antennal cells within the composite eye-antennal imaginal disc [15]. A topologically-associating domain covering the entire bab locus was detected in Hi-C data from eye-antennal discs (Fig 3A) [38], revealing particularly strong interactions between bab1-2 promoter regions.
We then used published genome-wide data from Chromatin Immuno-Precipitation (ChIP-Seq) and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-Seq), as well as Assay for Transposase-Accessible Chromatin (ATAC-Seq) experiments [38][39][40][41], looking for active enhancer marks (H3K4me1 and H3K27Ac) and nucleosome-depleted chromatin regions (thus accessible to transcription factors), respectively. In the eye-antennal disc active enhancer signatures are mainly associated with a~15-kb-long genomic region encompassing the bab1 promoter, first exon and part of its first intron (Fig 3B). Note that LAE is also accessible to transcription factors and carries H3K4me1 marks, consistently with its enhancer activity characterized in distal antennal cells [21].
To investigate further the role of this putative enhancer region (hereafter referred to as ESR) within bab1, we analyzed previously-published ChIP-Seq data from L3 leg discs [42] for binding sites for Dll which is critically required to cell-autonomously activate bab1 and bab2 [21,43,44]. In addition to expected binding over LAE [21] and the bab2 promoter, strong Dll binding is also detected throughout ESR, including over the bab1 promoter ( Fig 3B).
Taken together, we concluded that the bab1 transcription unit is predicted to include uncharacterized limb-specific regulatory information (i.e.., ESR) acting redundantly with the LAE enhancer.

LAE functions together with cis-regulatory elements located within bab1
To validate the existence of regulatory information within the bab1 locus, we performed phenotypic rescue experiments with Bacterial Artificial Chromosome (BAC) constructs covering each about 100 kb of genomic DNA. We have previously shown that a X-linked BAC construct, BAC26B15 ZH2A , encompassing bab2 and the downstream intergenic sequence including LAE (see Fig 4A), is able to rescue (i) Bab2 expression in the tarsal primordium and (ii), distal leg phenotypes detected in homozygous animals for the protein null allele bab AR07 ,

PLOS GENETICS
Enhancer rewiring within the Drosophila bric-a-brac gene complex affecting both Bab1 and Bab2 [15]. Conversely, a mutant BAC26B15 construct (BAC26-B15ΔLAE ZH2A ) inserted at the same genomic landing site (i.e., ZH2A on the X chromosome) and specifically lacking LAE sequence is unable to rescue Bab2 tarsal expression and leg phenotypes of bab AR07 mutants (Fig 4B-4D) [21]. These data indicated that (i) in absence of

PLOS GENETICS
Enhancer rewiring within the Drosophila bric-a-brac gene complex redundant cis-regulatory information, LAE is essential for bab2 expression in the developing tarsus and (ii) the enhancer information redundant with LAE is located outside the genomic region covered by BAC26B15, which only includes the bab1 first exon and thus lacks adjacent intronic ESR sequences.
To validate the putative regulatory information within the bab1 transcription unit, we have tested the capacity of another BAC, BAC69B22, which overlaps entirely bab1 but lacks LAE (see Fig 4A), to restore Bab1 expression in homozygous bab AR07 leg discs. As shown in Fig 4E  and 4F, the X-linked BAC69B22 ZH2A construct could partially restore bab1 expression in ts2-4 cells, indicating that it contains cis-regulatory information redundant with LAE activity in these tarsal segments. To test the capacity of BAC69B22 sequences to also regulate bab2 expression in ts2-4 cells, we placed BAC69B22 ZH2A across BAC26B15ΔLAE ZH2A , to allow pairingdependent trans-interactions (i.e., transvection; both constructs being inserted at the same ZH2A landing site) between the two X chromosomes in females. This configuration partially restored Bab2 expression in ts2-4 cells from bab AR07 mutant L3 leg discs, albeit in salt and pepper patterns (Fig 4G), diagnostic of transvection effects [37].
Taken together with our previous chromatin data, these genetic results are consistent with the existence within the 15 kb bab1 ESR of uncharacterized cis-regulatory information capable to drive some bab1 and bab2 expression in distal leg tissues and acting redundantly with the LAE enhancer. The large size and complexity of this region, together with data mining from the literature, suggested that this region includes interspersed regulatory elements whose functional implication in the developing leg and antenna deserves to be studied separately.

The bab complex arose from a gene duplication event in the Cyclorrhapha lineage
Both specific and common LAE enhancer activities toward bab1 and bab2, as well as LAE apparent redundancy with regulatory information from the bab1 locus provided us with a unique model to address the issue of evolutionary conservation of cis-regulatory landscapes governing expression of tandem paralogous genes.
To analyze the phylogenetic relationships between these different Bab1/2-related proteins, their primary sequences were aligned and their degree of structural relatedness examined through a maximum likelihood analysis. As expected from an ancient duplication, cyclorrhaphan Bab1 and Bab2 paralogs cluster separately, while singleton BTB-BabCD proteins are more related to cyclorrhaphan Bab1 than Bab2 (Fig 5B). Branch length comparison indicates that cyclorrhaphan bab2 paralogs have diverged more rapidly than their bab1 twins and thus that the Bab2 clade artificially cluster separately through long-branch attraction.
Interestingly, contrary to most nematocerans, two or even three bab gene paralogs are present in the fungus gnat Coboldia fuscipes (Psychodomorpha) and the gall midge Mayetiola destructor (Bibionomorpha), respectively. Significantly, M. destructor and C. fuscipes bab paralogs (i) cluster separately in our phylogenetic analysis ( Fig 5B) and (ii) two are arrayed in the same genomic context in both species (S3 Fig), indicating that they have likely been generated through independent gene duplication processes in the Bibionomorpha and Psychodomorpha lineages, respectively.

PLOS GENETICS
Enhancer rewiring within the Drosophila bric-a-brac gene complex

PLOS GENETICS
Enhancer rewiring within the Drosophila bric-a-brac gene complex Taken together, and updating a previous work [17], our phylogenomics analysis (summarized in Fig 6B and 6C) indicates that the bab tandem genes originated from a duplication event within the Cyclorrhapha dipteran lineage.

LAE sequences emerged early on in the Brachycera, thus predating bab gene duplication
Having traced back the bab gene duplication raised the question of the evolutionary origin of the LAE enhancer, which regulates both bab1 and bab2 expression [21] (this work). We have previously shown that LAE includes three subsequences highly-conserved among twelve reference Drosophilidae genomes [48], termed CR1-3 (for Conserved Regions 1 to 3; see S4A Fig and S1 Data), of which only two, CR1 and 2, are critical for tissue-specificity [21,29]. The 68 bp CR1 includes contiguous binding sites for Dll and C15 homeoproteins, while the 41 bp CR2 comprises contiguous binding sites for Dll as well as the ZF protein Bowl (S4B and S4C  Fig, respectively) [21,29].
To trace back the LAE evolutionary origin, we then systematically searched for homologous CR1-3 sequences (>50% identity) in dipteran genomes for which we identified one or two bab genes. Importantly, conserved LAE sequences have not been yet reported outside drosophilids. Small genomic regions with partial or extensive homologies to the CR1 (encompassing the C15 and Dll binding sites) and CR2 (particularly the Dll and Bowl binding sites) could be detected in all examined Brachycera families but not in any nematoceran (Figs 6B, S4B, and  S4C). Contrary to closely-associated CR1-2 homologous sequences, no CR3-related sequence could be identified nearby, in any non-Drosophilidae species. Significantly, homologous LAE sequences are situated (i) in between the tandemly-duplicated paralogs in cyclorrhaphan species for which the entire bab locus sequence was available to us, suggesting an evolutionarilyconserved enhancer role, or (ii) 20 kb upstream of the bab singleton gene in the Asiloidea P. coquilletti (see Fig 6C).
Taken together, as summarized in Fig 6B and 6C, these data suggest that a LAE-like enhancer with CR1-and CR2-related elements emerged early on in the Brachycera suborder, 180-200 million years ago, and has been since fixed within or upstream the bab locus in the Cyclorrhapha and Asiloidea superfamilies, respectively.

Unlike LAE, other bab CREs have not been conserved beyond the Cyclorrhapha
The broad LAE sequence conservation led us to also trace back the evolutionary origins of the cardiac CE, abdominal anterior AE and sexually-dimorphic DE cis-regulatory elements (see Fig 6A). While CE only regulates bab2, the AE and DE elements are predicted to govern both bab1 and bab2 expression in abdominal cells. Significantly, CE-and DE-related sequences could be only detected within schizophorans (excepted in Calyptratae) (Figs 6B, S5B, and S5C, respectively), whereas AE-related sequences could be readily identified within bab loci from drosophilids (S1 Data) but not from Aschiza, Empidoidea and Nematocera.
In conclusion, as summarized in Fig 6B and 6C, contrary to the LAE enhancer which among the Diptera emerged early on in the Brachycera suborder, other so-far identified bab cis-regulatory sequences have not been conserved beyond the Cyclorrhapha. Thus, and unlike the brachyceran LAE (CR1-2) sequences, these data indicate that other shared enhancer sequences (i.e., DE and AE) have been evolutionarily-fixed after the bab1-2 paralog emergence.
Homology searches revealed that bab1 promoter sequences have been strongly conserved in the three extant Cyclorrhapha families and even partially in some Asiloidea (e.g., P. coquellitti), for which a singleton bab gene is present (Figs 6B and S6B). In striking contrast to bab1, sequence conservation of the bab2 promoter could only be detected among some Acalyptratae drosophilids (Figs 6B and S6C). In agreement with a fast-evolutionary drift for bab2 promoter sequences, the duplicated Inr is even only detected in Drosophila group species.
Taken together, these evolutionary data (summarized in Fig 6B) indicate that, likewise for the LAE enhancer, bab1 promoter sequences have been under strong selective pressure among the Brachycera, both in the Cyclorrhapha and Asiloidea, while paralogous bab2 promoter sequences diverged rapidly among cyclorrhaphans. As discussed below, this evolutionary divergence may explain apparent differential activity of the LAE on each bab promoter.

Discussion
In this work, we have addressed the issue of the emergence and functional diversification of enhancers from two tandem gene duplicates. Using the Drosophila bab locus as a model, we showed that the paralogous genes bab1 and bab2 originate from an ancient tandem duplication in the Cyclorrhapha lineage. The early-fixed brachyceran LAE sequence has been coopted lately to regulate both bab1 and bab2 expression in a cyclorrhaphan. Furthermore, this unique enhancer is also responsible for paralog-specific bab2 expression along the P-D leg axis. Finally, LAE governs only some aspects of bab1-2 expression in the developing limbs because redundant cis-regulatory information, which remains to be characterized, is present within the D. melanogaster bab1 gene. This work raises some hypotheses about (i) how a single enhancer can drive specificity among tandem gene duplicates, and (ii) how enhancers evolutionary adapt with distinct cognate gene promoters.

A long-lasting enhancer sequence predating resident gene duplication
Our comprehensive phylogenomics analyses from highly diverse Diptera families indicate that the bab complex has been generated through tandem duplication from an ancestral singleton gene within the Cyclorrhapha (i.e., higher flies), about 100-140 years ago. This result contrasts with published data reporting that the duplication process having yielded the tandem bab genes occurred much earlier in the Diptera lineage leading to both the Brachycera (true flies; i.e., with short antenna) and Nematocera (long horned "flies", including mosquitos) suborders [17]. In fact, tandem duplication events implicating the bab locus did occur in the Bibionomorpha, as reported [17], and even in the Psychodomorpha with three bab gene copies (Figs 4, 5, and S3), but our phylogenetic analysis supports independent events. Thus, within the emerging dipteran lineages, the ancestral bab singleton gene had a high propensity to duplicate locally.
Gene duplication is a major source to generate phenotypic innovations during evolution, through diverging expression and molecular functions, and eventually from single gene copy translocation to another chromosomal site. Emergence of tissue-specific enhancers not shared between the two gene duplicates, as well as of "shadow" enhancers, have been proposed to be evolutionary sources of morphological novelties [6,55]. In this study, we have shown a strong evolutionary conservation of LAE subsequences among brachycerans, notably its CR2 element containing Dll and Bowl binding sites (S4C Fig). This conservation suggests a long-lasting enhancer function in distal limb-specific regulation of ancestral singleton bab genes, which has recently been co-opted in drosophilids to allow differential bab gene expression.

A shared enhancer differentially regulating two tandem gene paralogs
Here, we have shown that a single enhancer, LAE, regulates two tandem gene paralogs at the same stage and in the same expression pattern. How can this work? It has been proposed that enhancers and their cognate promoters are physically associated within phase-separated nuclear foci composed of high concentrations of TFs and proteins from the basal RNA polymerase II initiation machinery inducing strong transcriptional responses [56,57]. Our Hi-C data from eye-antennal discs show a strong interaction between bab1 and bab2 promoter regions (Fig 3), suggesting that both bab promoters could be in close proximity within such phase separated droplets, thus taking advantage of shared transcriptional regulators and allowing concerted gene regulation. In contrast, no strong chromosome contacts could be detected between LAE and any of the two bab promoter regions, indicating that this enhancer is not stably associated to the bab2 or bab1 promoter in the eye-antennal disc (where only the antennal distal part expresses both genes). It would be interesting to gain Hi-C data from leg discs, in which the bab1-2 genes are much more broadly expressed.
In addition to being required for bab1-2 co-expression in proximal tarsal segments, we showed here that the LAE enhancer is also responsible for paralog-specific bab2 expression along the proximo-distal leg axis. While it has been proposed that expression pattern modifications occur through enhancer emergence, our present work indicates that differential expression of two tandem gene paralogs can depend on a shared pre-existing enhancer (i.e., LAE). How this may work? Relative to its bab1 paralog, bab2 tarsal expression extends more proximally within the Dac-expressing ts1 cells [43] and more distally in the ts5 segment expressing nuclear Bowl protein. Furthermore, both Dac and Bowl proteins have been proposed to act as bab2 (and presumably bab1) repressors [29,36,58]. CRISPR/Cas9-mediated LAE excision allowed us to establish that this enhancer is critically required for paralog-specific bab2 expression proximally and distally, in ts1 and ts5 cells, respectively. In this context, we and others have previously proposed that transiently-expressed Rotund activating TF may antagonize Bowl (and eventually Dac) repressive activity to precisely delimit bab2 expression among ts1 cells [21,58]. Given that bab1-2 are distinctly expressed despite being both regulated by Bowl and Rotund, we propose that paralog-specific LAE activity depends on privileged interactions with bab2 promoter sequences. Thus, we speculate that the bab2 promoter responds to Rotund transcriptional activity differently from its bab1 counterpart. Consistent with this view, ectopic Rotund expression reveals differential regulatory impacts on the two bab gene promoters (S1B Fig). We envision that this could occur through specific interactions between LAE-bound TFs (e.g., Rotund) and dedicated proteins within the PolII pre-initiation complex stably-associated to the bab2 core promoter.

Differential enhancer-promoter interplay through evolutionary changes?
Despite that sequence homologies between both promoters (consistent with an ancient duplication event mobilizing the ancestral singleton bab promoter) are still detectable, it is significant that the bab2 promoter evolves much faster than its bab1 counterpart. While the bab1 promoter sequence has been strongly conserved among cyclorrhaphans, with sequence homologies with brachyceran singleton bab promoters, the bab2 promoter sequence has only been fixed recently among Drosophilidae, notably through the Initiator (Inr) sequence duplication, indicating very fast evolutionary drift after the gene duplication process which yielded the bab1/2 paralogs. We envision that this evolutionary ability has largely contributed to allow novel expression patterns for bab2, presumably through differential enhancer-promoter pairwise interplay.
The UAS-dsRNA stock used to obtain interfering RNA against lines (#40939) was obtained from the Bloomington stock Center.

CRISPR/Cas9-mediated chromosomal deletion
Guide RNAs (gRNAs) were designed with CHOPCHOP at the Harvard University website (https://chopchop.cbu.uib.no/). Four gRNA couples were selected that cover two distinct upstream and downstream LAE positions: TGCGTGGAGCCTTCTTCGCCAGG or TGGAGCCTTCTTCGCCAGGCCGG; and TATACTGTTGAGATCCCATGCGG or TTAGGCGCACATAAGGAGGCAGG (the PAM protospacer adjacent motif sequences are underlined), respectively. Targeting tandem chimeric RNAs were produced from annealed oligonucleotides inserted into the pCFD4 plasmid, as described in (http://www.crisprflydesign. org/). Each pCFD4-LAE-KO construct was injected into 50 Vasa-Cas9 embryos (of note the vasa promoter sequence is weakly expressed in somatic cells). F0 fertile adults and their F1 progeny, with possible somatic LAE-deletion events and candidate mutant chromosomes (balanced with TM6B, Tb), respectively, were tested by polymerase chain reactions (PCR) with the following oligonucleotides: AGTTTTTCATCCCCCTTCCA and GTATTTCTTTGCCTTGCC ATCG (predicted wild-type amplified DNA: 2167 base pairs).

Transcription factor binding prediction
DNA binding predictions were done using the motif-based sequence analysis tool TomTom from the MEME suite (https://meme-suite.org/meme/tools/tomtom) and the Fly Factor Survey database (http://mccb.umassmed.edu/ffs/).