Neuronal cell fate specification by the molecular convergence of different spatio-temporal cues on a common initiator terminal selector gene

The extensive genetic regulatory flows underlying specification of different neuronal subtypes are not well understood at the molecular level. The Nplp1 neuropeptide neurons in the developing Drosophila nerve cord belong to two sub-classes; Tv1 and dAp neurons, generated by two distinct progenitors. Nplp1 neurons are specified by spatial cues; the Hox homeotic network and GATA factor grn, and temporal cues; the hb -> Kr -> Pdm -> cas -> grh temporal cascade. These spatio-temporal cues combine into two distinct codes; one for Tv1 and one for dAp neurons that activate a common terminal selector feedforward cascade of col -> ap/eya -> dimm -> Nplp1. Here, we molecularly decode the specification of Nplp1 neurons, and find that the cis-regulatory organization of col functions as an integratory node for the different spatio-temporal combinatorial codes. These findings may provide a logical framework for addressing spatio-temporal control of neuronal sub-type specification in other systems.


Author summary
The nervous system contains a myriad of different cell types. These are specified by elaborate transcription factor cascades, starting with early factors that provide spatial and temporal information, to late factors that dictate final cell identity. The molecular nature of such cascades is poorly understood in any system. We focus on two related neuropeptide neurons in the Drosophila central nervous system, for which an extensive genetic pathway has been identified. We identify the enhancers for the different genes in the cascade, and conduct an extensive molecular analysis of these. Our findings reveal that different spatial and temporal cues converge on different enhancers of a key initiator terminal selector gene, which then triggers a feedforward cascade of sequential enhancer activation, ultimately landing on the enhancer of the neuropeptide gene. These findings may point to general mechanisms underlying specification of unique neuronal cell fate in many systems. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
The nervous system contains a myriad of different neuronal sub-types, and understanding cell fate specification remains a major challenge. Studies in a number of systems have revealed that neuronal subtype specification relies upon complex cascades of regulatory information, involving spatial and temporal selector genes [1], onwards to terminal selector genes [2,3], often acting in combinatorial codes [4][5][6]. With respect to spatial information, the Hox homeotic selector genes, expressed in distinct but partly overlapping domains along the antero-posterior axis of the central nervous system, have been extensively studied for their role in cell fate specification [reviewed in [7,8]]. With regard to temporal information, seminal studies in the Drosophila embryonic central nervous system (CNS) has identified a temporal cascade, where the sequential expression of the transcription factors Hunchback (Hb), Kruppel (Kr), Pdm2 and Nubbin (collectively referred to as Pdm), Castor (Cas) and Grainy head (Grh) play out in most, if not all neuroblasts (NBs) [reviewed in [9]]. The temporal factors dictate the identity of neurons and glia being specified at different stages of NB lineage progression. Although not conserved in its entirety, research in mammals has pointed to similar temporal progressions, and begun identifying some of the factors involved [reviewed in [10]]. In addition, studies have revealed that the Hox spatial information can converge with temporal cues to thereby specify neuronal subtypes [11]. While these functional genetic studies have provided insight into the genetic mechanisms underlying neuronal subtype specification, it is largely unclear how the broader spatio-temporal cues are molecularly integrated to cause discrete terminal selector gene expression, and how terminal selectors feed forward to final cell identity.
The Drosophila ventral nerve cord (VNC; defined here as thoracic segments T1-T3 and abdominal A1-A10) contains~10,000 cells at the end of embryogenesis, which are generated by a defined set of~800 neuroblasts (NBs) [12][13][14][15][16]. The Apterous neurons constitute a small sub-group of interneurons, identifiable by the selective expression of the Apterous (Ap) LIMhomeodomain factor, as well as the Eyes absent (Eya) transcriptional co-factor and nuclear phosphatase ( Fig 1A) [17,18]. A subset of Ap neurons express the Nplp1 neuropeptide, but can be sub-divided into the lateral thoracic Tv1 neurons, part of the thoracic Ap cluster of four cells, and the dorsal medial row of dAp neurons ( Fig 1A) [6,19]. In line with the distinct location of the Tv1 and dAp neurons, studies have revealed that they are generated by distinct NBs; NB5-6T and NB4-3, respectively [20,21]. A number of studies have addressed the genetic mechanisms underlying the specification of the Tv1 and dAp neurons, and the regulation of the Nplp1 neuropeptide. These have revealed that two distinct spatio-temporal combinatorial transcription factor codes, one acting in NB5-6T and the other in NB4-3, converge on a common initiator terminal selector gene; collier (col; Flybase knot), encoding a COE/EBF transcription factor (Fig 1B) [20][21][22]. Col in turn is necessary and sufficient to trigger a feed forward loop (FFL) consisting of Ap, Eya and the Dimmed (Dimm) bHLH transcription factor, which ultimately activates the Nplp1 gene [6]. Strikingly, the combinatorial coding selectivity of the spatio-temporal cues combined with the information-coding capacity of the FFL results in the selective activation of Nplp1 in only 28 out of the~10,000 cells within the VNC. While these genetic studies have helped resolve the regulatory logic of this cell specification event, they have not addressed the molecular mechanisms by which the two different spatio-temporal combinatorial codes intersect upon the col initiator terminal selector, to trigger a common terminal FFL, or the molecular nature of the FFL.
To address this issue, we have identified enhancers for Tv and dAp neuron expression for the genes in the common Tv1/dAp FFL: col, ap, eya, dimm and Nplp1. We generated transgenic reporters for these enhancers, both wildtype and mutant for specific transcription factor Showing the feedforward regulatory cascade critical for terminal specification of both the Tv1/Nplp1 and dAp/Nplp1. In Tv1 specification col is activated by spatial input via Antp, lbe, hth/exd together with temporal input from cas. In dAp specification col is activated via the temporal factors Kr, pdm1/2 together with spatial input from grn. Once col is activated in both cell subtypes, the same Nplp1 terminal binding sites, to test their regulation in mutant and misexpression backgrounds. We also used CRISPR/Cas9 technology to delete these enhancers in their normal genomic location to test their necessity for gene regulation. Strikingly, we find that the distinct upstream spatio-temporal combinatorial codes, which trigger col expression in Tv1 versus dAp neurons, converge onto different enhancer elements in the col gene. Hence, the col Tv1 neuron enhancer is triggered by Antp, hth, exd, lbe and cas, while the dAp enhancer is triggered by Kr, pdm and grn. In contrast to this subset-specific enhancer set-up for col activation, the subsequent, col-driven Nplp1 FFL feeds onto common enhancers in each downstream gene. These findings reveal that distinct spatio-temporal cues, acting in different neural progenitors, can trigger the same FFL by converging on discrete enhancer elements in an initiator terminal selector, to thereby dictate the same ultimate neuronal subtype cell fate.

Identification of enhancers for the terminal selectors
The Ap neurons constitute a set of interneurons in the Drosophila VNC, out of which the thoracic lateral Tv1 neurons and the dorso-medial dAp neurons express the Nplp1 neuropeptide ( Fig 1A) [6,17,19]. Tv1 neurons are generated by NB5-6T, while dAp neurons arise from NB4-3 [6,21]. Activation of Nplp1 in Tv1 and dAp neurons is controlled by a shared coherent FFL, consisting of col, ap, eya and dimm, where col is both necessary and sufficient to trigger the FFL [6,21]. In contrast, this common FFL is triggered by two different upstream spatiotemporal combinatorial codes, acting in the two different NBs. In NB5-6T this includes the temporal gene castor (cas), the Hox homeotic gene Antennapedia (Antp), the two Hox co-factor genes homothorax (hth) and extradenticle (exd), as well as the homeobox gene ladybird early (lbe). In NB4-3, this includes the temporal genes Kruppel (Kr) and pdm2/nub (pdm), as well as the GATA gene grain (grn), (Fig 1B) [20][21][22].
To identify the cell-specific cis-regulatory modules (CRMs) that act as enhancers for the five genes in the dAp/Tv1 FFL, we analyzed expression of a number of transgenic lines generated in previous studies [17,23,24], as well as an eya-CRM-Gal4 transgene (provided by T. Lian and D.W. Allan; S1 Fig). This resulted in identification of fragments capable of driving reporter gene expression in the Tv1 and dAp neurons. To facilitate mutagenesis of CRMs, we attempted to identify smaller genomic fragments that retained appropriate activity. This resulted in the identification of smaller (1-2 kilobases) CRMs for all genes with the exception selector cascade is triggered in both dAp and Tv1 to specify the Nplp1 cell fate. (C) Enhancer-reporter constructs used to study cell type specific expression of the factors critical during Nplp1 cell fate specification in both dAp and Tv1 cells. col expression is under the control of two different enhancers. Expression of col in the NB5-6T is controlled by the col-Tv-CRM while col expression in the dAp cells is controlled by the col-dAp-CRM. For ap enhancer studies we used two different enhancer fragments. To test the enhancer in mutant background we used the apS2-lacZ and for mutation of transcription factor binding sites the shorter apSJ2-lacZ was used. The Nplp1-CRM contains the promoter for Nplp1, and in order to avoid ectopic expression of GFP from the Nplp1 promoter, the DNA sequence of the Nplp1-CRM was placed in reverse orientation in front of the GFP reporter. (D) Staining of the col-Tv-CRM reporter construct in T1-T3 for β-gal and lbe(K)-GFP (NB5-6T) (false colored) shows that the col-Tv-CRM drives reporter expression broadly in the thoracic region but also specifically in the NB5-6T. (E-I) Staining for β-gal or GFP as a readout of enhancer activity and Eya as a location marker on whole VNCs shows overlap between the reporter expression and the Ap (both Ap cluster and dAp) cells. (J) Zoom in on the NB5-6T shows a robust overlap between the col-Tv-CRM and endogenous Col expression at stage14. (K-N) Detailed analysis of the enhancer expression in the Ap cluster at stage AFT shows a precise overlap between the apS, eya, dimm and Nplp1 enhancer-reporter constructs and endogenous gene expression (compare to A). of col, where larger fragments were required for proper expression (Fig 1C-1S, S1 Fig, S1 Data). Strikingly, we found that while one enhancer region was sufficient to recapitulate Tv1 and dAp expression of ap, eya, dimm and eya, for col we identified two distinct enhancers, one each for expression in dAp or Tv1 neurons (Fig 1C-1E and 1T).

CRISPR/Cas9 deletion of enhancers affects cell type specific expression of terminal selectors
Enhancer studies have revealed that some genes may be controlled by several enhancers with partially redundant function, such as 'shadow enhancers', which act to ensure high-fidelity in gene expression [25]. These shadow enhancers have been identified in a growing number of genes, in particular early developmental regulators [26,27]. We wanted to address the importance of the identified enhancers within the context of their normal genomic location. To this end, we used CRISPR/Cas9 technology, with two spaced gRNAs, to delete each of the identified enhancers in the FFL (Fig 2A; Materials and Methods; S2 Data) [28].
Focusing on col first, we analyzed the col-dAp-CRM (generating the col ΔdAp-CRM deletion mutant) and observed that deletion of this enhancer resulted in significant loss of Col, Eya and Nplp1 expression, but as anticipated only in dAp and not in Tv1 neurons (Figs 2D, 2I, 2J, S2C and S2F-S2H). In contrast, we found that deletion of the col-Tv-CRM (col ΔTv-CRM ) did not result in any effect upon Eya or Nplp1, in either dAp or Tv1 neurons (Fig 2B, 2C, 2I and 2J). We furthermore did not observe any effects on Col expression itself, either within the Ap cluster at AFT or globally (S2A, S2B, S2D, S2E and S2G Fig). As anticipated, we also did not observe any effects on Eya, Nplp1 or Col expression in dAp neurons (Figs 2B, 2C, 2I, 2J, S2E and S2H). Given the specificity of this element when placed in a promoter-lacZ transgenic construct (Fig 1D and 1J), we found this lack of effect surprising. This prompted us to analyze the NB5-6T neuroblast at St14, right after the onset of endogenous Col expression in this lineage. Measuring Col expression levels in the col ΔTv-CRM mutants we did indeed observe a minor but significant reduction in expression (Fig 2K-2M).
Next, we analyzed eya, ap, dimm and Nplp1 enhancer deletions (eya ΔCRM , ap ΔapS-CRM , dimm ΔCRM , Nplp1 ΔCRM ), and observed that all exhibited strong effects. Specifically, as anticipated, all enhancer deletions resulted in significant reduction or loss of expression of the targeted gene, in both Tv1 and dAp neurons (Figs 2E, 2H and S2I-S2X). Moreover, in line with the previous genetic analysis that identified an eya/ap->dimm->Nplp1 FFL, deletion of the eya, ap or dimm enhancers all significantly reduced Nplp1 expression (Figs 2E-2J, S2I-S2N and S2Q-S2U). Also in line with this FFL, deletion of ap or eya enhancers did not affect one another's expression (Figs 2F and S2J-S2P). Within the Ap cluster, while deletion of the eya enhancer affected Nplp1 expression in Tv1, we did observe Eya expression in two cells in the cluster (S2J Fig). However, analysis at AFT, using Col as a specific Tv1 marker at this stage, revealed that Eya expression was lost from the Tv1 neuron, hence explaining the strong effect on Nplp1 in eya ΔCRM mutants (S3A and S3B Fig). In line with the previous genetic analysis, dimm enhancer deletion did not affect either Eya or Ap expression (Figs 2H-2J, S2R and S2U). Finally, deletion of the Nplp1-CRM did not affect expression of Eya, Ap or Dimm (Figs 2H-2J, S2S and S2V-S2X).
We conclude that activation of col in dAp neurons strongly depends upon the col-dAp-CRM element, while in contrast, expression of col in Tv1 neurons may operate via several enhancers, some of which presumably must reside outside of the col-Tv-CRM. In contrast, for the postmitotically expressed terminal selectors ap, eya and dimm, as well as the Nplp1 neuropeptide gene, their expression in Ap neurons appears to be critically dependent upon one discrete enhancer element.

Molecular analysis of the FFL enhancers
Having identified necessary and sufficient enhancers for the genes in the FFL, we proceeded to address the putative molecular connections between the upstream spatio-temporal cues and col, as well as between the FFL genes. This was approached by testing the enhancer transgenes in the pertinent mutant backgrounds, as well as mutating relevant candidate binding sites within each enhancer.
Focusing on the col Tv enhancer first, we introduced the col-Tv-CRM-lacZ transgene into the Antp, cas, hth and lbe mutant backgrounds. This resulted in significant reduction of expression in all four cases, when compared to the enhancer transgene in a wild type background (control) stained on the same slide, with the strongest effect in cas mutants, which displayed a near-complete loss of expression ( Fig 3A-3F). Next, we mutated conserved DNA-binding sequences for Antp, Cas, Hth and Exd within the col-Tv-CRM-lacZ, and integrated these into the same genomic location as the wild type transgenic construct ( Fig 3M; Materials and Methods; S3 Data and S4 Data). We assayed β-gal expression in NB5-6T at St14, and found that all of the four mutated enhancer transgenes displayed reduced expression, when compared to the enhancer transgene in a wild type background (control) stained on the same slide (Fig 3G-3M).
Next we turned to the col-dAp-CRM-GFP enhancer and introduced it into the Kr, pdm and grn mutant backgrounds. We observed significant reduction in GFP expression in all three mutants, when compared to the enhancer transgene in a wild type background (control) stained on the same slide (Fig 4A-4E). The loss of GFP expression in the dAp cells, was accompanied by the loss of Eya expression. Next, we mutated all possible binding sites, conserved and non-conserved, for Kr, Pdm (POU-HD) and Grn (GATA) (Fig 4L; Materials and Methods; S3 Data and S4 Data). We integrated these mutant transgenes into the same genomic location as the wild type transgenic construct and assayed the expression of GFP expression in dAp neurons at stage AFT. We found that the enhancers mutated for Kr or Pdm displayed reduced number of dAp cells expressing GFP, when compared to the enhancer transgene in a wild type background (control) (Fig 4F-4H and 4J). The enhancer transgene mutated for Grn sites did not show a numerical loss of GFP expressing dAp cells, but did however show a significantly reduced level of expression in these cells (Fig 4I and 4K).
For analyzing the ap enhancer, we focused on the smaller apS2J-CRM-lacZ transgene, and placed this in the mutant background for Antp, lbe and col. Focusing on the Tv1 neurons, we observed significant reduction in β-gal expression in all three mutants (Antp, lbe and col), when compared to the enhancer transgene in a wild type background (control) stained on the same slide (Fig 5A-5E). As anticipated from the selective role of Antp, acting in the thorax, and lbe, in NB5-6T, the dAp neurons were only reduced in the thorax in Antp, and unaffected in lbe (Fig 5A-5D and 5F). In contrast, col mutants affected β-gal expression in both Tv1 and dAp neurons (Fig 5D-5F). Next, we mutated all conserved binding sites for Q50-Homeodomain proteins (TAAT; affecting both Antp and Lbe), Col and Exd (Fig 5M; Materials and Methods; S3 Data and S4 Data). We integrated these mutant transgenes into the same genomic location as the wild type transgenic construct and assayed the expression of β-gal expression in Tv1 and dAp neurons at stage 16. We found that all three mutated enhancers displayed Quantification of Eya and Nplp1 positive dAp cells in control and CRM mutants. col ΔdAp-CRM , eya ΔCRM , and ap ΔapS-CRM show significant decrease in Eya positive dAp cells, while col ΔTv-CRM , dimm ΔCRM and Nplp1 ΔCRM mutants show no effect. In contrast, all CRM mutants shows significantly reduced numbers of Nplp1 positive dAp cells ( *** p 0.0001, n = 10 embryos, Students t-test, +/-SEM). (K-L) Staining for Col and Dpn in the NB5-6 at St14, in control and col ΔTv-CRM mutants. (M) Quantification of Col expression levels in the NB shows that Col levels are significantly reduced by 17% in col ΔTv-CRM mutants ( * p = 0.013, n = 36 NBs, Students t-test, +/-SEM). Genotypes: https://doi.org/10.1371/journal.pgen.1006729.g002 reduced number of Tv1 and dAp cells expressing β-gal, when compared to the enhancer transgene in a wild type background (control) (Fig 5G-5L). We did not analyze the involvement of hth on the ap-CRM (or eya-CRM) because previous studies revealed that hth mutants could be  fully rescued by re-expression of col [22]. exd mutants must be analyzed both as maternal and zygotic mutants, and we did not attempt to introduce the ap-and eya-CRM transgenes into such backgrounds.
Similar to the ap enhancer analysis, we placed the eya-CRM-GFP enhancer in the mutant background for Antp, lbe and col. Focusing on the Tv1 neurons, we observed significant reduction in GFP expression in all three mutants, when compared to the enhancer transgene in a wild type background (control) stained on the same slide (S4A- . We integrated these mutant transgenes into the same genomic location as the wild type transgenic construct and assayed the expression of GFP expression in Tv1 and dAp neurons at stage AFT. We found that the enhancers mutated for Hox and Col sites displayed reduced expression in both Tv1 and dAp cells, when compared to the enhancer transgene in a wild type background (control) (S4G-S4I, S4K and S4L Fig). In contrast, the mutation of Exd sites had no effect in Tv1 neurons, and surprisingly showed up-regulation in dAp neurons (S4J- S4L Fig).
For analyzing the dimm enhancer, we placed the dimm-CRM-GFP transgene in the mutant backgrounds for Antp, ap, col and eya. We observed significant reduction of GFP expression in both Tv1 and dAp neurons in all four mutants, when compared to the enhancer transgene in a wild type background (control) stained on the same slide (S4A- S4G Fig). Next, we mutated all conserved and non-conserved binding sites for Q50-Homeodomain proteins (TAAT; affecting both Ap and Antp), Col and Exd (S5O Fig; Materials and Methods; S3 Data and S4 Data). We integrated these mutant transgenes into the same genomic location as the wild type transgenic construct and assayed the expression of GFP expression in Tv1 and dAp neurons at stage AFT. We found that enhancer mutants for Hox and Exd displayed reduced GFP expression in both Tv1 and dAp cells, when compared to the enhancer transgene in a wild type background (control) (S5H-S5N Fig). In contrast, the Col mutant enhancer showed slightly elevated expression in Tv1 neurons while expression was reduced in dAp neurons (S5J and S5L-S5N Fig).
For analyzing the Nplp1 enhancer, we placed the Nplp1-CRM-GFP transgene in the mutant backgrounds for col, ap, eya and dimm. We observed significant reduction of GFP expression in all four mutants, when compared to the enhancer transgene in a wild type background (control) stained on the same slide (Fig 6A-6G). Next, we mutated all possible conserved and nonconserved binding sites for Q50-Homeodomain proteins (TAAT; affecting Ap), Col and Dimm (E-boxes) (Fig 6P; Materials and Methods; S3 Data and S4 Data). We integrated these mutant transgenes into the same genomic location as the wild type transgenic construct and assayed the expression of GFP expression in Tv1 and dAp neurons at stage AFT. We observed that the enhancers mutated for Hox or Dimm sites displayed reduced GFP expression in both Tv1 and dAp cells, when compared to the enhancer transgene in a wild type background (control) (Fig 6H, 6I and 6K-6O). In contrast, the enhancer mutated for Col sites did not display any effect on GFP expression (Fig 6J, 6L and 6M).

Misexpression of terminal selectors can activate the enhancers
Previous studies have revealed that combinatorial misexpression of the transcription factors in the Tv1/dAp cascade is able to broadly activate the genes in the FFL [6,18,21,29,30]. To determine if such combinatorial ectopic effects could act upon the identified enhancers taken out of genomic context, we misexpressed various combinatorial codes of TFs and studied the effects on the pertinent transgenes. Focusing on the apS2-CRM-lacZ and eya-CRM-GFP, we find broad activation of both transgenes when lbe and col are co-misexpressed (Fig 7A-7D). Similarly, combinatorial misexpression of ap, eya and col could ectopically activate the dimm-CRM-GFP transgene (Fig 7E and 7F). Finally, the Nplp1-CRM-GFP transgene was broadly activated by combinatorial expression of ap, dimm, eya and col (Fig 7G and 7H). In all cases, as anticipated, we observed up-regulation of the endogenous Eya, Dimm and Nplp1 proteins ( Fig  7A-7H). These results demonstrate that ectopic activation of the dAp/Tv1 transcriptional program can robustly act upon the identified enhancers even outside their normal genomic context.

Discussion
Combining the findings presented in this study, we have been able to molecularly decode the Tv1/dAp genetic FFL cascades [6,20,21,30], bolstering evidence for a complex molecular FFL, based upon sequential transcription factor binding to the downstream genes (    temporal cascade [9]. Early temporal factors Kr and Pdm integrate with Grn in NB4-3, while the late temporal factor Cas integrates with Antp, Lbe, Hth and Exd in NB5-6T, to create two distinct combinatorial spatio-temporal codes. These two codes converge on two different enhancers in the col gene, triggering Col expression, and hence the Nplp1 FFL. The FFL, in this case a so-called coherent FFL, where regulators act positively at one or several steps of a cascade, was first identified in E.coli and yeast regulatory networks [31], but have also been identified in C.elegans and Drosophila [6,32,33]. Coherent FFLs can act as regulatory timing devices, exemplified by the action of col in NB5-6T: The initial expression of col in Ap cluster cells triggers a generic Ap/Eya interneuron fate in all four cells, while its downregulation in Tv2-4 and maintenance in Tv1 helps propagate the FFL leading to Nplp1 expression [6,20,21,30].

Spatio-temporal convergence on collier, an initiator terminal selector gene
We find that the two different spatio-temporal programs converge on col, but on different enhancer elements. However, neither enhancer element gave complete null effects when deleted. Specifically, the 6.3kb col-Tv-CRM shows robust reporter expression, overlaps with endogenous col expression, responds to the upstream mutants, and is affected by TFBS mutations. However, when deleted (generating the col ΔTv-CRM mutant), it had weak effects upon endogenous col expression in NB5-6T, and no effect upon Eya and Nplp1 expression. Deletion of the col-dAp-CRM (generating the col ΔdAp-CRM mutant), gave more robust effects with reduction of Col, Eya and Nplp1 in dAp cells, although the expression was not lost completely.
Early developmental genes, which often are dynamically expressed, may be controlled by multiple enhancer modules, to thereby ensure robust onset of gene expression. This has been reported previously in studies of early mesodermal and neuro-ectodermal development, in which several genes i.e., twist, sog, snail are controlled by multiple distal enhancer fragments, so called "shadow enhancers", in order to ensure reliable onset of gene expression [25]. The shadow enhancer principle is also supported by recent findings on the Kr gene [27]. Moreover, extensive CRM transgenic analysis, scoring thousands of fragments in transgenic flies, has also supported the shadow enhancer idea, revealing that a number of early regulators, several of which encode for transcription factors, indeed have shadow enhancers [26]. The dichotomy between the col transgenic reporter results and the partial impact on col expression upon deletion of its Tv1 and dAp enhancers, gives reason to speculate that col may be under control of additional enhancers, some of which may be referred to as shadow enhancers.
The results on the eya, ap, dimm and Nplp1 enhancer mutants stand in stark contrast to the col CRMs findings. For these four genes, the enhancer deletion resulted in robust, near null effects, on expression. It is tempting to speculate that our findings, combined with previous studies, points to a different logic for early regulators, with highly dynamic patterns, requiring several functionally overlapping enhancers for fidelity, and late regulators and terminal differentiation genes, which may operate with one enhancer that is inactive until the pertinent combinatorial TF codes have been established.

Action of collier
Analysis of the ap and eya enhancers indicates that Col directly interacts with these enhancers. Both of these enhancer-reporter transgenes are affected in col mutants, and can be activated by ectopic col. Moreover, mutation of one Col binding site in the ap enhancer and two sites in the eya enhancer, was enough to dramatically reduce enhancer activity. Direct action of Col on ap and eya is furthermore supported by recent data on Col genome-wide binding, using ChIP, which demonstrated direct binding of Col to these regions of ap and eya in the embryo [34]. The regulation of ap is an excellent example of the complexity of gene regulation, and studies have identified additional enhancers controlling ap expression in the wing, muscle and brain [35][36][37][38].
In contrast to regulation of ap and eya, a direct action of Col on dimm and Nplp1 is less clear. Analysis of the dimm and Nplp1 enhancers did not reveal perfectly conserved Col binding sites. Mutation of multiple non-perfect Col binding sites in the dimm enhancer did not affect reporter expression in the Ap cluster, but did however reduce levels in the dorsal Ap cells. Mutation of non-perfect Col binding sites in the Nplp1 enhancer had no impact on enhancer activity, neither in Tv1 nor dAp. These findings support a model where Col is crucial for directly activating ap and eya, which in turn directly activate dimm and Nplp1, with some involvement of Col on dimm (Fig 8). However, support for a direct role for Col on Nplp1 comes from RNAi studies in larvae or adult flies, showing that knockdown of col resulted in loss of Nplp1, while Ap, Eya and Dimm expression was unaffected [6,39].
It is tempting to speculate that Col regulates Nplp1 not via direct interaction with its enhancer, but rather as a chromatin state modulator, keeping the chromatin around the Nplp1 locus in an accessible state, in order for Dimm, Ap and Eya to be able to access the Nplp1 gene. Support for this notion comes from studies on the mammalian Col orthologue EBF, which is connected to the chromatin remodeling complex SWI/SNF during EBF-mediated gene regulation in lymphocytes [40]. Moreover, the central SWI/SNF component Brahma was recently identified in a genetic screen for Ap cluster neurons, and found to affect FMRFa neuropeptide expression in Tv4 without affecting Eya expression, indicating a late role in Ap cluster differentiation [41]. Alternatively, Col may activate Nplp1 via unidentified, low affinity sites, similar to the mechanism by which Ubx regulates some of its embryonic target genes [42].

Downstream of collier
Col activates the ap and eya genes. ap encodes a LIM-HD protein, a family of transcription factors well known to control multiple aspects of terminal neuronal subtype fate, including neurotransmitter identity, axon pathfinding and ion channel expression [17,[43][44][45]. Our results indicate that Ap in turn acts upon dimm, and subsequently with Dimm on Nplp1. eya encodes an evolutionary well-conserved phosphatase and does not bind DNA directly, instead acting as a transcriptional co-factor [46]. Eya (and its orthologues) have been found to interact with several transcription factors in different systems [46], but whether it forms complexes with Col and Ap is not known.
The final transcription factor in the FFL is Dimm, a bHLH protein. Dimm is selectively expressed by the majority of neuropeptide neurons in Drosophila, and is important for expression of many neuropeptides [6,19,29,47,48]. Intriguingly, Dimm is also both necessary and sufficient to establish the dense-core secretory machinery, found in neuropeptide neurons [29,[48][49][50][51][52]. Based upon these findings Dimm has been viewed as a cell type selector gene [1] or a "scaling factor" [53], acting to up-regulate the secretory machinery. Here, we find evidence for that Dimm acts directly on the Nplp1 enhancer, and this raises the possibility that Dimm is both a selector gene for the dense-core secretory machinery, and can act in some neuropeptide neurons to directly regulate specific neuropeptide gene expression.

Transgenic enhancer flies
In brief; all wild-type enhancers, were either PCR amplified (Expand High Fidelity Plus PCR system, from Roche Diagnostics (Indianapolis, IN, USA) from the OregonR DNA, or de-novo synthetized at GenScript Inc. (Piscataway, NJ, USA) in the case of the col-Tv-CRM enhancer and all other mutant enhancer versions. PCR amplified DNA fragments were cloned into the pCR2.1-TOPO1 TA vector according to the manufactures protocol (Invitrogen Life technologies, Carlsbad, CA, USA) for further cloning steps into the placZ.attB or pEGFP.attB landing site vectors [63] (provided by K. Basler and J. Bischof). Furthermore TOPO clones containing the wild type enhancer sequences were sent to GATC Biotech AG (Cologne, Germany) for Sanger sequencing. All synthesized enhancer constructs were delivered in a pUC57 vector and subsequently cloned either into the placZ.attB or pEGFP.attB landing site vectors, and integrated into the fly genome via site directed phiC31 mediated integration [64] at BestGene Inc (Chino Hills, CA, USA) or GenetiVision (Houston, TX, USA).

CRISPR/Cas9
The online tool (http://tools.flycrispr.molbio.wisc.edu/targetFinder/) was used to design two protospacers with zero predicted off-targets for each CRM, flanking the 5'and 3'regions of the identified enhancer constructs. Sequences for all protospacers can be found in the supplemental information (CRISPR). Primer design and vector assembly was done according to the protocol found at http://www.crisprflydesign.org/wp-content/uploads/2014/06/Cloningwith-pCFD4.pdf. PCR was performed using the Expand High Fidelity Plus PCR system (Roche Diagnostics, Indianapolis, IN, USA) according to the provided protocol with an annealing temperature of +61˚C. In order to delete the CRMs identified in this study, the tandem gRNA vector (pCFD4-U6:1_U6:3) (Addgene # 49411; gift from Simon Bullock) was used to express two gRNAs simultaneously, which flank the 5´and 3´regions of the CRMs. The empty vector served as a template during PCR amplification to introduce the protospacers into the gRNA core sequence and U6-1 and U6-3 promoter regions. PCR products containing the protospacers were cloned into the tandem gRNA vector by ligation independent cloning using Gibson Assembly according to the manufacturers' protocol (New England Biolabs Inc., Ipswich, MA, USA). All gRNA vector constructs were Sanger sequenced by use of the M13 for and M13 reverse primers to confirm for the correct insert (GATC Biotech AG, Cologne, Germany). Stable transgenic gRNA flies were generated at BestGene and tandem gRNA constructs containing attB landing sites were landed via phi31 mediated integration on the second or third chromosome on cytolocation 28E7 and 68A4. Fly stocks mutant for CRMs were created by crossing males of the transgenic tandem gRNA flies to virgins of vas-Cas9 (BL#51323). Stable stocks mutant for CRMs were tested by PCR by using PCR primers flanking the deleted region. PCR fragments spanning the deleted region were sequenced to confirm deletion (Supplemental Information CRISPR).

Confocal imaging and data acquisition
Zeiss LSM 700 Confocal microscopes were used for fluorescent images; confocal stacks were merged using LSM software or Adobe Photoshop. Statistic calculations were performed in Graphpad prism software (v4.03). Cell counts and reporter (GFP or β-gal) measurements were done with ImageJ FIJI and numbers transferred to Graphpad prism. To address statistical significance Student's t-test or in the case of invariant cell numbers contingency tables together with Chi-Square test were used. Images and graphs were compiled in Adobe Illustrator.