Unconventional function of an Achaete-Scute homolog as a terminal selector of nociceptive neuron identity

Proneural genes are among the most early-acting genes in nervous system development, instructing blast cells to commit to a neuronal fate. Drosophila Atonal and Achaete-Scute complex (AS-C) genes, as well as their vertebrate orthologs, are basic helix-loop-helix (bHLH) transcription factors with such proneural activity. We show here that a C. elegans AS-C homolog, hlh-4, functions in a fundamentally different manner. In the embryonic, larval, and adult nervous systems, hlh-4 is expressed exclusively in a single nociceptive neuron class, ADL, and its expression in ADL is maintained via transcriptional autoregulation throughout the life of the animal. However, in hlh-4 null mutants, the ADL neuron is generated and still appears neuronal in overall morphology and expression of panneuronal and pansensory features. Rather than acting as a proneural gene, we find that hlh-4 is required for the ADL neuron to function properly, to adopt its correct morphology, to express its unusually large repertoire of olfactory receptor–encoding genes, and to express other known features of terminal ADL identity, including neurotransmitter phenotype, neuropeptides, ion channels, and electrical synapse proteins. hlh-4 is sufficient to induce ADL identity features upon ectopic expression in other neuron types. The expression of ADL terminal identity features is directly controlled by HLH-4 via a phylogenetically conserved E-box motif, which, through bioinformatic analysis, we find to constitute a predictive feature of ADL-expressed terminal identity markers. The lineage that produces the ADL neuron was previously shown to require the conventional, transient proneural activity of another AS-C homolog, hlh-14, demonstrating sequential activities of distinct AS-C-type bHLH genes in neuronal specification. Taken together, we have defined here an unconventional function of an AS-C-type bHLH gene as a terminal selector of neuronal identity and we speculate that such function could be reflective of an ancestral function of an “ur-” bHLH gene.


Introduction
Nervous system development proceeds through sequential steps, starting with the early commitment to a neuronal fate, followed by the progressive restriction of fates, to finally reaching a terminal, differentiated end state. Proneural genes of the basic helix-loop-helix (bHLH) family play a key role in the initial stages of this process [1]. Mutant analysis in Drosophila revealed that loss of members of the Achaete-Scute complex (AS-C), as well as the related Atonal gene, resulted in the loss of the ability to generate neuroblasts in the peripheral nervous system [2][3][4][5]. Vertebrate orthologs of proneural AS-C and Atonal genes (the Mash and Math genes) also provide critical proneural function in vertebrate nervous system development [1,[6][7][8]. Thus, the proneural function of AS-C-type and Atonal bHLH genes is broadly conserved throughout evolution.
The C. elegans genome encodes a canonical complement of homologs of proneural bHLH genes, including seven AS-C-like genes (hlh-4, hlh-3, hlh-14, hlh-19/hnd-1, hlh -12, hlh-6, hlh-16) and one Atonal ortholog (lin-32) [9]. The function of many of these C. elegans bHLH genes in the nervous system has not been as extensively studied as their fly and vertebrate orthologs, but it is nevertheless clear that a number of these bHLH genes also provide proneural activities [10][11][12]. Like in flies and vertebrates, C. elegans proneural bHLH genes operate in a lineagespecific manner. For example, the C. elegans AS-C ortholog hlh-14 and the C. elegans Atonal ortholog, lin-32, provide proneural activity in several distinct sensory neuron lineages of the peripheral and central nervous system (CNS) of the worm [10][11][12]. In both cases, the proneural activity of hlh-14 and lin-32 is exemplified by a transformation of neuroblasts into cells with a hypodermal identity in the respective mutant backgrounds.
One question that has been studied extensively over the years is whether AS-C/Atonal-type bHLH genes have functions in the nervous system that go beyond their proneural activity. In both vertebrates and flies, nonproneural functions of AS-C and Atonal-like genes have indeed been described in the context of later neuronal differentiation events (reviewed in [1,6,13]). Similarly, C. elegans lin-32/Ato has functions beyond its proneural activity in male ray lineages in which lin-32 also allocates fates in subsequently developing ray sublineages [14]. However, in all these cases, the respective bHLH gene is either transiently expressed; acts through downstream, intermediary regulatory factors; or only affects selected aspects of the differentiated state of the respective neuron.
In this study, we describe a novel, nonproneural, and noncanonical function of an AS-Ctype bHLH gene. We find that the AS-C homolog hlh-4 displays a spatial and temporal specificity of expression that is unprecedented for any bHLH gene. hlh-4 is exclusively and continuously expressed in a single postmitotic nociceptive sensory neuron class in which it initiates and maintains the terminal identity of this neuron via direct binding to scores of terminal effector genes that are expressed in a neuron class-specific manner and that define the differentiated state of this neuron. Among its many functions in ADL, hlh-4 directly regulates the expression of the unusually large repertoire of olfactory receptor proteins in ADL. We hypothesize that the direct control of "neuron function genes" may have been an ancestral function of bHLH genes.

Strains
Strains were maintained by standard methods [15]. A list of all strains used is listed in S3 Table. Expression constructs and transgenic strain generation Green fluorescent protein (GFP) reporters for rescue and ectopic expression were generated using RF-cloning [16]. For making G-protein coupled receptor (GPCR) transgenic reporters (listed in S3 Table), a PCR fusion approach was used [17]. Genomic fragments were fused to the GFP coding sequence, which was followed by the unc-54 3 0 untranslated region. All transgenic lines created in this study were injected at 50 ng/μL with the unc-122::gfp into wild-type animals or with the pha-1 rescuing plasmid (pBX) as a coinjection marker (50 ng/μL) into pha-1 mutant animals. For each construct, two independent lines were scored.
The eat-4 reporter constructs were generated by PCR and subcloning into pPD95.75 vector. eat-4prom6-1 contains 4,450 bp of the upstream region of the ATG and eat-4prom2 contains 1,150 bp of the genomic region just upstream of the ATG. The E-Box and homeodomain motif are found at positions -693 and -726 relative to the ATG start codon, respectively. The specific sequences deleted are, for the E-Box, AACAGGTGTT, and for the homeodomain site, ATTAGATAAT. The deletions were generated by mutagenesis with the QuickChange Site-Directed Mutagenesis kit (Stratagene). The plasmids were injected into OH13645 [otIs518; him-5(e1490)] at 50 ng/μL, using unc-122::gfp (50 ng/μL) as a coinjection marker.

Microscopy
Worms were anesthetized using 50 mM sodium azide (NaN 3 ) and mounted on 5% agarose on glass slides. Images were acquired using an automated fluorescence microscope (Zeiss, AXIO Imager Z.2) or LCS-8 laser point scanning confocal. Representative images are shown following maximum projection of Z-stacks using the maximum intensity projection type. Image reconstruction was performed using Fiji software [21].

Neuron identification
ADL neurons were identified by labeling subsets of sensory neurons with DiD or DiO (Thermo Fisher Scientific). For dye filling, worms were washed with M9 and incubated at room temperature with DiD (1:500) in M9 for 1 hour for Adults or (1:250) for 2 hours for L1 stage animals. After incubation, worms were washed three times with M9 and plated on agar plates coated with food (OP50 bacteria) for 1-3 hours before imaging.

Embryonic expression pattern analysis
The expression of bHLH fosmid reporters was manually lineaged using SIMI BioCell program, as previously described [22]. Briefly, the gravid adults of hlh-4 Fosmid ::yfp (otIs683) and hlh-3 fosmid :: gfp (otIs648) were dissected and single two-cell embryos were mounted and visualized on a Zeiss Imager Z1 compound microscope using the 4D microscopy software, Steuerprg (Caenotec). Nomarski stacks were taken every 30 seconds and embryos were illuminated with LED fluorescence light (470 nm) at predetermined time points during development.

Avoidance assay
Avoidance assay was performed as previously described [23,24]. L4 stage animals were picked onto OP50 seeded plates before a day of assay. We used 100 nM or 500 nM ascr#3 or 1M glycerol diluted in M13 buffer. In the assay, M13 buffer was firstly dropped in front of animals' heads. When the animals didn't respond to M13 buffer, we then dropped ascr#3/glycerol and checked avoidance to the stimulus. Long reversals were counted as avoidance [25]. The tests were done at least 5 times with 10 animals each.

DNA motif discovery
Motif discovery was carried out using information-theoretic analysis as implemented in the Finding Informative Regulatory Elements (FIRE) algorithm [26]. De novo motifs were discovered by running FIRE in discrete mode, with all the genes in the C. elegans genome labeled as either belonging to class 1: the neuron-specific expression class (e.g., 117 ADL-expressed genes) or class 2: the complementary set of all other remaining genes. The starting k-mer seed length was set to k = 6 and the sequence search space was confined to 2-kb upstream regions. The discovered CACCTG motif had a robustness score of 10/10 with a significance z-score of 18.3.

Phylogenetic footprinting
We used TargetOrtho [27] to find whole genome CACCTG motif matches in five nematode genomes searching 2 kb upstream of each gene plus introns. ADL-expressed genes and all C. elegans genes, excluding noncoding RNAs, were compared using the Wilcoxon rank sums test to assess alignment independent species conservation scores, motif match position relative to the start codon, and motif match frequency per gene. Only genes with at least one CACCTG match were analyzed.

hlh-4 is exclusively expressed in nociceptive ADL neurons
As a first step toward a systematic analysis of the neurogenic function of C. elegans bHLH genes, we undertook a nervous system-wide expression pattern analysis of all C. elegans AS-Clike genes. Using fosmid-based reporter transgenes, we found that many bHLH genes are expressed during embryonic development within and outside neuronal lineages, but we noticed that one AS-C-like bHLH gene, hlh-4, displays an unusual expression pattern, both in terms of spatial and temporal specificity (Fig 1). hlh-4 expression is not observed in any blast cells during embryonic or postembryonic development but rather is first expressed in two pairs of postmitotic cells in the precomma stage embryo, shortly after their birth (Fig 1A). One pair is the ADL neurons and the other pair is the sisters of ADL, which die shortly after their birth by programmed cell death [28]. Expression of hlh-4 in ADL is observed for the remainder of embryogenesis, continues during larval and adult stages, and is never observed in any other cell throughout the entire organism (Fig 1A). The fosmid on which the yfp reporter construct is based is able to fully rescue the hlh-4 mutant phenotype that we describe below (rescue data are shown in Table 1). The ADL-specific fosmid-based reporter expression pattern is recapitulated by a 700-bp 5 0 promoter fusion reporter (Fig 1C).
The only other bHLH reporter expressed in postmitotic neurons throughout embryonic, larval, and adult stages is the Daughterless homolog hlh-2/Da [29], a binding partner of many C. elegans AS-C-related bHLH genes [30]. Expression of HLH-2/DA protein in a specific subset of postmitotic neurons, including the nociceptive neurons ADL and ASH, has been previously reported using anti-HLH-2 antibody staining [29], but it was not reported whether expression persisted into later larval and/or adult stage. Using a fosmid-based reporter of hlh-2/Da expression, we found that ADL and ASH expression of hlh-2/Da, as well as expression in a few other head and tail neurons, is maintained throughout all larval stages into adulthood (Fig 1B). We conclude that hlh-4/AS-C and its heterodimerization partner hlh-2/Da are continuously coexpressed specifically in the nociceptive ADL neuron class.

Continuous hlh-4 and hlh-2 expression is ensured by transcriptional autoregulation
One well-documented mechanism by which transcription factors ensure their continuous expression throughout the life of a neuron is through transcriptional autoregulation (e.g., [36][37][38][39]). To assess whether continuous expression of hlh-4 throughout the life of the ADL neuron is also ensured by autoregulation, we used a 5 0 promoter fusion of the hlh-4 locus, which recapitulated the continuous expression of hlh-4 in ADL (Fig 1C). We crossed this reporter into an hlh-4 mutant allele, tm604, a putative null allele generated by the C. elegans knockout consortium in Tokyo [40] in which the bHLH domain is largely deleted (Fig 1A). We found that hlh-4 reporter expression in the ADL neuron pair is initiated normally in hlh-4 mutant embryos, but expression fails to be maintained beyond the first larval stage (Fig 1C). As yet unknown factors may initiate hlh-4 expression in the embryo and, after its initiation, hlh-4 takes over to regulate its own expression.
We furthermore tested whether continuous expression hlh-2/Da in ADL requires hlh-4 activity. Crossing the hlh-2 fosmid reporter into the hlh-4 mutant background, we indeed found this to be the case (Fig 1B). We conclude that the continuous expression of both hlh-4 and its putative cofactor hlh-2/Da is based on transcriptional autoregulation.

hlh-4 does not act as a proneural gene
In most if not all organisms examined, AS-C genes have proneural function, characterized by a loss of neuroblast identity in the absence of the AS-C gene and ensuing conversion into an ectodermal identity [1,3,6,13]. Previous work has demonstrated that in the lineage that produces ADL, as well as other sensory neurons, the transiently and early-expressed AS-C gene hlh-14 acts as a proneural gene, such that loss of hlh-14 results in a neuroblast to hypodermal fate conversion [12]. In striking contrast, we find that the later-expressed hlh-4 gene does not act as a proneural gene. Specifically, in hlh-4 null mutants, the ADL neuron pair is still generated and differentiates as a neuron, as assessed by (a) intact expression of a panneuronal reporter, rab-3, (b) intact filling of the ADL neuron with the dye DiI (which is taken up by the dendritic endings of several sensory neurons, including ADL [41]), and (c) presence and intact speckled appearance of the ADL neuronal nucleus by Nomarski optics (Fig 2A). Corroborating this notion, we find that the two genes that are expressed by all ciliated sensory neurons, osm-6 and ift-20 [42,43], are still normally expressed in the ADL neurons of hlh-4 mutants (Fig 2B). Even though we could not confirm the previously reported expression of hlh-3 in ADL (Fig 1A), we nevertheless generated hlh-3; hlh-4 double null mutants and found that in these animals the ADL neurons are also still generated normally, as assessed by intact DiI filling and characteristic neuronal nuclear speckles (Fig 2A).
The expression of the hlh-4 promoter fusion in hlh-4 mutants until the first larval stage permitted us to visualize the anatomy of the ADL neurons in the absence of hlh-4 gene function. While the cell body of ADL is normally positioned, we find that ADL axons and dendrites display severe morphological defects (Fig 2C). The sensory dendrites of ADL are often detached from the nose. Even when attached, the cilia of ADL often do not display their characteristic bifurcated ciliated endings. The axons of ADL, which in wild-type animals display a highly stereotyped extension and branching pattern, show pathfinding and branching defects (Fig 2C).

hlh-4 affects expression of the unusually large repertoire of olfactory receptors in ADL
To examine whether and to what extent hlh-4 is required to specify ADL neuron identity, we examined the differentiation program of the ADL neurons in detail. The ADL nociceptive showing the specific cells from ADL ancestors that express hlh-2, hlh-3, hlh-4, and hlh-14 during embryogenesis. Representative images of hlh-3 and hlh-4 gene expression at embryonic stages next to their exact time point during embryonic development (left). hlh-4 fosmid reporter is first detected in ADLs and their sister cells as soon as they are born (left panel). Roughly 30 minutes after they are born, sisters of ADLs die by apoptosis (Asterisks indicate ADLs sister cells that are destined to die by apoptosis) and hlh-4 expression becomes restricted to ADLs only (right panel). Yellow dashed line is marking gut autofluorescent. Expression of hlh-2 in the dying ADL sister could not be examined. hlh-14 expression is shown for comparison and was reported in [12]. (B) Schematic of fosmid reporter for hlh-2. HLH-2 is continuously expressed in very few neurons throughout adulthood, among them ADL, and this continuous ADL expression depends on hlh-4. Previous work had examined expression of hlh-2 in L1 stage animals only [29]. (C) Continuous hlh-4 expression in ADL is assured via autoregulation. In hlh-4 mutants, hlh-4 expression, as assessed with a hlh-4 promoter fusion, initiates normally at the embryonic stage; however, it fails to maintain the expression past L1 stage. bHLH, basic helix-loop-helix; L1, first larval stage.  neuron pair coexpresses an unusually large number of olfactory-type GPCRs [44][45][46]. Reporter genes generated for about one fifth of the approximately 1,300 GPCR encoding reveal the expression of more than 60 GPCR genes from diverse families in ADL [46]. Extrapolating to the complete set of GPCRs encoded in the C. elegans genome, about 300 GPCR-encoding genes may be expressed in ADL. We asked whether hlh-4 is required for the expression of 12 GPCR-encoding genes. We chose these genes to cover the diverse set of GPCR gene families expressed in ADL (sra, sre, sri, srz, srh, srxa, and srx families). We found that expression of all of the tested 12 GPCR reporters is abrogated in hlh-4 mutants (Fig 3A). While all defects were routinely scored at the adult stage, we note that these defects are already apparent at the first larval stage. Consistent with the absence of expression of the hlh-4 paralog hlh-3 in postmitotic ADL neurons, we find that hlh-3 does not affect srh-127 expression in ADL.
To test whether hlh-4 does not only affect expression of chemoreceptor proteins but also affects the chemorepulsive function mediated by the ADL neurons, we considered its chemorepulsive function toward a specific nematode pheromone, the ascaroside ascr#3 (asc-ΔC9, C9) [24]. While wild-type hermaphrodites are repelled by ascr#3, this repulsion is significantly reduced in hlh-4 hermaphrodites (Fig 3B). This is not a reflection of an overall failure to engage in a nociceptive response because another chemorepulsive behavior, mediated by the ASH neurons (glycerol avoidance) [47], is not affected in hlh-4 mutants (Fig 3B).

hlh-4 specifies the neuron type-specific molecular signature of ADL
We tested whether hlh-4 function is restricted to controlling olfactory receptor expression and function in the ADL neurons or whether other identity features of ADL are disrupted as well. A TRP channel protein encoded by the osm-9 gene, expressed in a restricted set of sensory neurons, is required in ADL to signal the response to distinct chemorepulsive sensory inputs [24,48,49]. We find that osm-9 expression is selectively lost in the ADL neurons of hlh-4 mutant animals (Fig 4).
Going beyond signal perception and transmission, we asked whether ADL requires hlh-4 to communicate with its synaptically connected neurons [50]. Based on the expression of the vesicular glutamate transporter eat-4/VGLUT, the key defining feature of all glutamatergic neurons, ADL neurons have previously inferred to be glutamatergic [51]. We find that the glutamatergic identity of ADL, as assessed by eat-4 fosmid reporter gene expression, is defective in hlh-4 mutant animals (Fig 4). Apart from using glutamate as a likely fast neurotransmitter, the expression patterns of various neuropeptide-encoding genes indicate that ADL also utilizes distinct peptides for neurotransmission [52,53]. We find that the expression of four neuropeptides, previously known to be expressed in ADL, as well as other neurons (FMRFamides flp-4 and flp-21 and neuropeptides nlp-7 and nlp-10) [52,53] specifically fail to be expressed in the ADL neurons of hlh-4 mutants, while expression in other neurons is unaffected (Fig 4).
Apart from peptidergic and chemical synaptic transmission, electrical synaptic transmission is likely also affected in hlh-4 mutants. ADL forms electrical synapses with a select number of neighboring neurons [50]. Electrical synapses are formed by transmembrane innexin proteins [54], and 3 of the 24 C. elegans innexin genes, unc-7, inx-18, and che-7, are expressed in 3; hlh-4 double null mutants. (B) The pansensory identity of ADL neurons in hlh-4 mutant is intact, shown here using ciliated genes markers (osm-6 and ift -20). Numerical values that underlie the graph are also shown in S1 Data. (C) hlh-4 mutants display ADL neuron morphology defects. ADL is visualized with a hlh-4prom reporter whose expression is still visible at the first larval stage (as shown) of hlh-4 mutants. Defects include (1) detachment of dendrites (labeled in green) from the nose (surprisingly, even the detached ADL neurons are still able to dye fill, as inferred by the completely unaffected dye filling of hlh-4 mutants [panel A]), (2) cilia defect (no branching or extra branching of normally bifurcated ciliated ending), and (3)  ADL, as well as a specific set of other neuron types [55]. The expression of all three innexin genes is lost specifically in the ADL neurons of hlh-4 mutants (Fig 4). Transmembrane ion channel expression is also affected in hlh-4 mutants. Na + /Ca 2+ -K + exchangers are important regulators of intracellular calcium homeostasis in the nervous system, and members of this family show remarkably specific gene expression profiles in the C. elegans nervous system [56]. Two Na + /Ca 2+ -K + exchangers, ncx-6 and ncx-7, are each exclusively expressed in the ADL neurons of wild-type animals [56]. The expression of both genes in ADL is abrogated in hlh-4 mutants (Fig 4).
To examine whether these defects are a consequence of the failure of solely maintaining the differentiated state versus failure of initiation of the differentiated state, we examined the expression of several ADL markers right after hlh-4 mutant embryos had hatched. Testing four specific markers (srh-127, sre-43, srt-47, and ncx-6), we found that expression is already affected at this early stage of development.
In conclusion, we find that several distinct identity features that define functional features of the ADL neuron are coregulated by the same transcription factor. The affected identity features share the common theme of providing the ADL with a unique molecular signature and identity. In contrast, hlh-4 does not affect generic neuronal features (i.e., pansensory or panneuronal features).

hlh-4 is sufficient to induce ADL features in other neuron classes
hlh-4 is not only required for the expression of ADL identity genes, but ectopic expression of hlh-4 is also sufficient to induce ADL identity features. We drew this conclusion by driving expression of hlh-4 in many other ciliated sensory neurons, using the arrd-4 promoter [57] (S1 Fig). The arrd-4prom::hlh-4 construct is not only able to rescue the loss of srh-127::gfp expression in ADL in hlh-4 mutants ( Table 1), but these transgenic animals display ectopic expression of the normally ADL-expressed srh-127::gfp reporter in many ciliated sensory neurons (Fig 5A). Similarly, the TRP channel osm-9, the neuropeptide-encoding flp-4 gene and the vesicular glutamate transporter eat-4 also are ectopically expressed in other sensory neurons in these transgenic animals (Fig 5A).
To further probe the ability of hlh-4 to induce ADL identity features in other neurons, we misexpressed hlh-4 under control of a promoter fragment from the unc-3 locus, which is expressed in ventral cord motor neurons and a small set of head neurons (S1B Fig). Transgenic animals expressing a unc-3prom::hlh-4 construct show ectopic expression of the ADL marker srh-127::gfp in head neurons but not in ventral cord motor neurons (Fig 5B). The apparent cellular context dependency of hlh-4 function mimics the context dependence of other master regulators of cellular identity, such as Eyeless/Pax6 [58].

cis-Regulatory regions of ADL-expressed genes are enriched for a specific E-box motif
Because gene expression is usually examined in C. elegans via reporter gene constructs, a large library of reporter transgenes that monitors the expression of thousands of genes has been amassed by the C. elegans community over the past few decades. In many cases, expression patterns of these reporter transgenes have been defined on a single neuron level. Almost 200 reporter transgenes have been found to be expressed in the ADL neurons (www.wormbase. org, S2 Table). The genes tested above for their dependence on hlh-4 belong to this dataset. We took a subset of these genes (117) and asked whether 5 0 upstream regulatory regions of Fig 4. hlh-4 is required for the acquisition of terminal ADL identity. Indicated gfp reporters were crossed into hlh-4(tm604) mutants and expression quantified. The function/identity of the marker genes is indicated above each panel. In all panels in which a gfp reporter is used, DiD staining (red) is used to label the amphid neurons, including ADL. Numerical values that underlie the graph are shown in S1 Data. https://doi.org/10.1371/journal.pbio.2004979.g004 Terminal selector of nociceptive neurons genes whose expression is monitored by these reporter transgenes are enriched for the presence of a specific sequence motif using the FIRE motif analysis platform [26] (see Materials and methods). We restricted the search space to the first 2 kb upstream of these genes. As a control, we also considered several other neuron classes that Wormbase associated with a large number of reporter genes (AIY, ASE, ALM, HSN, ASI, ASK, ASH, PHA; www.wormbase.org) and interrogated the upstream regulatory control regions of those genes. In the ADL dataset, we indeed identified a motif found in 75% of the ADL-expressed reporter genes ( Table 2, S1  Table; S2 Table). The motif, shown in Fig 6A, has a completely invariant 6-nucleotide core, CACCTG, and no striking sequence features outside this core. There is no orientation preference for this motif on the plus versus minus strand. This motif is not enriched in the control datasets (AIY, ASE, ALM, HSN, ASI, ASK, ASH, or PHA expressed reporter genes). The CACCTG motif matches experimentally determined bHLH binding sites (CANNTG) [59] and specifically matches the in vitro binding site of the C. elegans HLH-4/HLH-2 heterodimer, CA(G/C)CTG [30]. Probabilistic segmentation analysis of upstream regulatory sequences of ADL neuron-expressed GPCR genes had previously also identified a similar CA (G/C)CTG motif [45].
All the 23 terminal effector genes that we described above as depending on hlh-4 in their expression in ADL (Fig 3; Fig 4) contain at least one copy of this motif within 2 kb upstream of the 5 0 start of the gene ( Table 2, S1 Table). The one hlh-4-dependent GPCR reporter (srh-79) that does not contain a perfect match to the E-box motif contains a 1-nucleotide-mismatched copy of the motif (CACGTG versus CACCTG).
The hlh-4 locus itself and, specifically, the 700-bp 5 0 upstream regulatory region that shows hlh-4 autoregulation (Fig 1C) contains two copies of the perfectly matched CACCGT motif (both motifs are located in the 245-bp-long intergenic region). Moreover, the upstream region of the hlh-2/Da gene, the putative cofactor of hlh-4, which is also continuously expressed in ADL, also contains three copies of this motif in its 5 0 upstream intergenic region. The regulation of hlh-2/Da expression by hlh-4 (demonstrated above) is therefore also likely a reflection of direct autoregulation of the hlh-2 locus by the HLH-4/HLH-2 heterodimer.
Three lines of evidence further validate the importance of the CACCGT E-box motif for ADL expression: 1. We mutated the CACCGT E-box motif in one of the newly discovered, hlh-4-dependent targets, srh-127, and found that this mutation abolished expression in ADL (Fig 6B).

2.
We examined whether a set of 35 ADL-expressed reporter genes not included as a training set for the FIRE analysis also contain the CACCTG motif. All of these 35 reporter genes code for GPCRs that were found to be expressed in ADL after the initial FIRE analysis was performed [46]. We found that 33 out of the 35 ADL-expressed reporters contain the CACCTG motif ( Table 2, S1 Table). In contrast to the presence of the E-box motif in ADL-expressed and hlh-4-dependent genes, we found that panneuronal genes [60] are largely devoid of the CACCTG E-box (rab-3, ric-4, snb-1, unc-64, sng-1, unc-10, unc-18,  3. We generated 12 reporters to additional sets of genes (again all GCPR-encoding genes) that the FIRE analysis revealed to either contain or not contain this motif. All of the nine genes that contain a CACCTG motif indeed showed expression in ADL (Fig 6C). Three GPCR reporters that do not contain a CACCTG motif show no expression in ADL (Fig 6C).

Phylogenetic conservation of the E-box motif and further validation of its importance for ADL expression
We used phylogenetic footprinting in the TargetOrtho pipeline [27] to assess the extent of conservation of the CACCTG motif among five Caenorhabditis species, C. elegans, C. briggsae, C. remanei, C. brenneri, and C. japonica (S2 Table). This analysis provided a genome-wide assessment of the location of the CACCTG motif in these five different species and allowed us to define a number of features of the CACCTG motif: 1. The ADL-expressed genes tend to have more conserved CACCTG motifs among phylogenetically conserved, orthologous genes compared to any gene in the genome that contains a CACCTG motif (Fig 6D).
2. ADL genes have more CACCTG motifs compared to any gene with a CACCTG in the genome. This is true for all Caenorhabditis species but is most obvious in C. elegans (Fig 6D).
3. The upstream CACCTG positions are closer to the start codon in the known ADL-expressed genes compared to any gene with a CACCTG in the genome (Fig 6D). This trend is most obvious in C. elegans but is also significant in the other four Caenorhabditis species.
Moreover, we find that two of the ADL-expressed genes that do not contain a perfect match to the CACCTG motif (srh-79 and srh-186, one of which, srh-79, we confirmed to be hlh-4dependent) contain a motif with a single mismatch to the CACCTG motif (CACGTG), yet all Caenorhabditis species that have orthologues of these two genes contain perfect CACCTG motif matches ( Table 2, S1 Table).
In conclusion, a CACCTG motif defines a signature for ADL-expressed genes. Given that this motif is a known in vitro binding site for a HLH-4/HLH-2 dimer [30], hlh-4 appears the most likely candidate to directly activate the expression of scores of genes that uniquely and combinatorially define the terminally differentiated state of the ADL neuron pair.

hlh-4 displays complex regulatory interactions with the lin-11 LIM homeobox gene
The partially penetrant effect of hlh-4 on eat-4/VGLUT expression suggested that hlh-4 partly relies on additional factors to control eat-4/VGLUT expression. This notion is further expression, used as a coinjection marker. (C) Reporter expression pattern of GPCRs that contain or do not contain the indicated E-box core motif CACCTG. DiD staining (red) is used to label the amphid neurons, including ADL. (D) Cumulative distributions of Caenorhabditis elegans upstream CACCTG conservation per gene (left), CACCTG site count per gene (middle), and CACCTG upstream site position per gene (right). Blue: whole genome genes; purple: ADL-expressed genes. ADL genes had motif matches that were conserved in 3.39 species' orthologs, on average, versus 1.85 species' orthologs amongst all genes with at least one CACCTG site match (Wilcoxon test statistic = 11.34, p = 8.23e−30). ADL genes had 3.23 CACCTG matches per gene compared to 2.67 CACCTG matches genome wide per gene with at least one site match (Wilcoxon test statistic = 4.41, p = 9.90e−06). CACCTG site positions in ADLexpressed genes were on average 657.35 bases upstream of the start codon compared to 1,001.77 bases upstream of the start codon genome wide per gene with at least one site match (Wilcoxon test statistic = 7.98, p = 1.42e−15). Terminal selector of nociceptive neurons corroborated through an examination of the cis-Regulatory control regions of the eat-4/ VGLUT locus. We find that 4.5 kb of sequence upstream of the eat-4/VGLUT locus directs reporter gene expression to many glutamatergic neurons, including ADL (prom6 -1; Fig 7A). This 4.5-kb region contains a phylogenetically conserved CACCTG motif 691 bp upstream of the ATG. Deletion of this motif results in loss of expression in ADL (Fig 7A). However, while this motif is required for ADL expression, it is apparently not sufficient: deleting 3.2 kb from the 4.5-kb 5 0 reporter fusion leaves the E-box unaffected but abolishes expression in ADL (prom2; Fig 7A), suggesting that these deleted sequences contain binding site(s) for a transcription factor that cooperates with hlh-4 to activate eat-4/VGLUT expression.
The LIM homeobox gene lin-11 was previously shown to be expressed in postmitotic ADL neurons throughout their lifetime [61]. We find that lin-11 expression in ADL is not affected in hlh-4 mutants (Fig 7B). Corroborating a role of lin-11 in parallel to hlh-4, we find that lin-11 null mutants are defective in the ADL-mediated chemorepulsive response to C9 ascaroside (Fig 7C). Consistent with this behavioral defect, we observed that lin-11 null mutants display defects in the expression of several of hlh-4-dependent and E-box-containing genes, including ncx-6, srh-234, and flp-21 (Fig 7D). However, lin-11 does not affect the hlh-4-dependent flp-4 gene, nor does it affect eat-4/VGLUT fosmid reporter expression (Fig 7D).
We tested whether a function for lin-11 on eat-4/VGLUT expression could be revealed in the context of an hlh-4 mutant background, in which eat-4/VGLUT fosmid reporter expression is only partially affected. lin-11; hlh-4 double mutants still normally express pansensory markers in ADL, but they display a dye filling defect that neither mutant alone displays, corroborating the parallel nature by which hlh-4 and lin-11 affect ADL differentiation (Fig 7E). Surprisingly, in hlh-4; lin-11 double null mutants, the partially penetrant loss of eat-4/VGLUT expression observed in hlh-4 single mutants was not enhanced but instead completely suppressed (Fig 7D). The same effect is observed on the flp-4 gene. Its completely penetrant loss in hlh-4 mutants is suppressed in hlh-4; lin-11 double mutants (Fig 7D).
The reinstatement of eat-4/VGLUT fosmid expression even in the absence of hlh-4 is mirrored by a mutation in the cis-Regulatory control region of eat-4/VGLUT. The 1.2-kb upstream region of eat-4/VGLUT, which contains an hlh-4 binding site but is not expressed in ADL, becomes expressed in ADL upon deletion of a predicted homeodomain binding site, a potential recognition motif for LIN-11 (Fig 7A). This result suggests that eat-4/VGLUT expression is controlled via a collaboration of hlh-4 with an as yet unknown transcription factor X whose activating effect is normally antagonized by LIN-11. If all activators (hlh-4 and X) are present, lin-11 cannot prevent activation of eat-4/VGLUT (eat-4prom6-1delta12); hence, eat-4/VGLUT is expressed in ADL. If, however, the system is partially destabilized by hlh-4 removal (or by removal of the E-box sequence in the reporter construct), lin-11 can counteract the ability of factor X to activate eat-4/VGLUT expression (eat-4prom2delta 12) (as assessed by the restoration of eat-4 expression upon removal of lin-11). The effect of lin-11 on ADL-expressed genes is, however, clearly target gene dependent. While in the case of one target gene, eat-4/VGLUT, lin-11 appears to antagonize hlh-4 function, it may positively cooperate with hlh-4 on those other target genes whose expression is either completely or partially lost in hlh-4 and/or lin-11 mutants. We conclude that hlh-4 is a central regulator of ADL identity that may interact in a target gene-dependent manner with distinct collaborating factors.

Discussion
The identification of proneural genes that act very early in neuronal development to allocate neuroblast identity to distinct neuronal lineages via classic genetic loss of function analysis in Drosophila represents one of the classic landmark achievements of developmental locus. The precise nature of the deletion of the motifs is shown in the Materials and methods section. Scoring of expression of two lines is shown in the right panel. It was previously published that an integrant of the transgene eat-4prom2 was expressed in ADL (otIs376) [51]. We have since analyzed multiple extrachromosomal lines of eat-4prom2 (seven lines all showing similar expression; quantification of two lines is shown here) as well as two lines of eat-4prom1 (a slightly larger construct; not shown) and found none of these lines to display expression in ADL. The previously reported expression of these transgenes in ADL was likely an array artifact that affected the repressive effect of the homeodomain binding sites described here. (B) lin-11 fosmid reporter expression is not affected in hlh-4 null mutants. (C) ADL-mediated chemosensory behavior toward C9 ascaroside. Statistical significances shown were calculated with the one-way ANOVA Dunnett's test. (D) Effect of lin-11 null mutants on terminal ADL markers, alone or in combination with hlh-4 null mutants. The previously reported partial effect of lin-11 on the brightness of expression of an eat-4 fosmid (otIs388) [51] could not be repeated with this or other eat-4 reporter transgenes. The data from hlh-4 and N2 are repeated (from Figs 3A, 4) in the graph for ease of neurogenetics [2,3]. The subsequent cloning of vertebrate AS-C and Atonal homologs has revealed the deep conservation of this fundamental neural patterning mechanism [1,[6][7][8]. We have described here a novel functional property of an AS-C gene, demonstrating that C. elegans hlh-4 joins the rank of terminal selector-type transcription factors that act in postmitotic neuron classes to initiate and maintain the differentiated state of a specific, postmitotic neuron class. hlh-4 displays all the hallmarks of a terminal selector [62,63]: it is required for initiation of the terminal differentiation program of the ADL neuron pair, it is continually expressed throughout the life of the neuron (suggesting that it also maintains neuronal identity), this continuous expression is mediated by direct autoregulation via HLH-2/HLH-4 binding sites in the hlh-2 and hlh-4 loci, and, most importantly, hlh-4 controls the vast majority of neuron class-specific genes whose combinatorial coexpression defines ADL identity, yet it does not control generic neuronal features (panneuronal and pansensory features). Hence, exactly like other terminal selectors [62,63], hlh-4 separates the adoption of neuron type-specific features (hlh-4-dependent) from the acquisition of an overall, panneuronal/pansensory identity (hlh-4independent) (Fig 8A). It is important to precisely appreciate this fundamental dichotomy in neuronal gene expression programs, repeatedly observed in many different neuron classes and corroborated here by the hlh-4 mutant phenotype: as schematized in Fig 8A, genes that are expressed in specific subsets of neuron classes are terminal selector dependent, while genes that are expressed in a non-neuron-class-specific manner are regulated by independent means [60].
The terminal selector function of hlh-4 is likely exerted in collaboration with the canonical AS-C cofactor, hlh-2/Da, which shares with hlh-4 the unusual feature of postmitotic expression throughout the life of the ADL neuron class. hlh-2 is also continuously expressed in a small number of additional neuron classes, but its function in these neurons remains unknown. In yeast one-hybrid assays, HLH-4/HLH-2 has been shown to bind to the CACCTG sequence that we describe here [30]. While the HLH-4/HLH2 complex and its cognate binding site is essential-and at least in some context also sufficient-for gene expression in ADL, it is unlikely to act on its own. With its 6-bp length, the recognition element of the HLH-4/HLH-2 heterodimer occurs too frequently in the genome to direct HLH-2/HLH-4 exclusively to ADLexpressed genes. We find that the LIM homeobox gene lin-11 assists hlh-4 in the regulation of some but not all hlh-4-dependent target genes. As no DNA cis-Regulatory motif was found to be significantly enriched in ADL-expressed genes by our bioinformatic analysis in addition to the E-box, we propose that hlh-4 is a central core inducer of all ADL-specific genes but may be assisted in its function, i.e., provided the proper specificity, by interaction with a suite of distinct, target gene-dependent collaborating factors, such as lin-11 and perhaps other, as yet to be discovered factors (Fig 8B).
Previous work on AS-C genes in worms has revealed that the AS-C-type hlh-14 gene acts as a conventional proneural gene during early embryonic patterning to specify the neuronal identity of an AB-blastomere-derived lineage branch that produces several sensory neurons, including ADL [12]. In the absence of hlh-14, cells in this lineage branch convert to a hypodermal identity [12] (Fig 8C). Hence, the ADL neuron depends on the successive activity of two distinct AS-C-type genes, one acting as a conventional proneural gene (hlh-14), followed by hlh-4, which acts in a subbranch of this lineage, to specify terminal ADL identity (Fig 8C).
Whether hlh-14 directly activates hlh-4 expression is presently unclear. Notably, though, the comparison. (E) ADL neurons of lin-11; hlh-4 mutants fail to dye fill but are still generated as assessed by expression of pansensory marker. For ease of comparison the data from hlh-4 and N2 are repeated (from Fig 2B) in the graph. Numerical values that underlie the graph shown in this figure are shown in S1 Data. https://doi.org/10.1371/journal.pbio.2004979.g007 Terminal selector of nociceptive neurons Fig 8. Schematized hlh-4 functions. (A) Overall logic of hlh-4/terminal selector function. "0" indicates gene/feature not expressed; "1" indicates expressed (this binary scheme is a simplification). Like other terminal selectors, hlh-4 genetically separates the adoption of neuron-specific features, i.e., genes expressed in specific parts of the nervous system from the adoption of a panneuronal identity. Rather than being defined by genes uniquely expressed in ADL, ADL identity is uniquely defined by a combinatorial signature of genes expressed in multiple neuron types. If those genes show selective expression in other neuron types, they are terminal selector E-box motif in the hlh-4 locus that is required for maintaining hlh-4 expression is not required for initiation of hlh-4 expression in the embryo.
Even though a proneural function of AS-C-type genes is clearly a deeply conserved function of bHLH genes, our findings prompt the intriguing question as to whether a function of bHLH genes in directly controlling the differentiated state of a neuron may have been an even more ancestral function of AS-C-type bHLH genes. In support of such notion, the AS-C ortholog in the cnidarian Hydra magnipapillata, Cnash, was previously reported to not be expressed in neuronal precursors but rather in differentiating and mature neurons, leading the authors of that report to postulate a role of hydra Cnash in initiating and maintaining the neuronal phenotype [64], exactly as we propose here for C. elegans hlh-4. Loss of function studies of the AS-C orthology NvashA of the sea anemone Nematostella vectensis cannot distinguish between a proneural versus terminal differentiation role [65].
Subsequent to such terminal differentiation role, an "ur-" bHLH may then have become coopted into more upstream regulatory events in proliferating blast cells. A somewhat similar trajectory has been proposed for the Pax6/Eyeless gene, originating with a function in regulating lens protein to subsequent recruitment to earlier steps of eye development [66]. Of course, it is also conceivable that the terminal selector function of hlh-4 may be a derived feature, one that perhaps came into existence via the acquisition of an E-box motif in the hlh-4 locus that lead to hlh-4 expression being "locked" into a terminal and continuous function. More detailed expression pattern analysis of AS-C and Ato-like genes in the adult nervous system of other species will provide hints whether hlh-4-like, terminal selector functions may also be carried by AS-C/Atonal genes in other organisms. In fact, such function may be conceivable in an already previously reported case. Drosophila Atonal is expressed in mature dorsal cluster neurons in the dorsolateral CNS of the flies [67]. In these neurons, Ato has no proneural function but instead serves to control arborization patterns. However, whether Ato has an impact as broad as hlh-4 on controlling the differentiated state of these neurons is not yet known.
C. elegans sox-2/SoxB1 is another gene whose orthologs in other organisms (SoxB factors) act in early neuronal patterning [68] but that has become employed as a terminal selector in C. elegans [69,70]. Here again, the question is whether such late role is a reflection of an ancestral or derived function of this gene. It is important to keep in mind that the existence of such late functions (in addition to the well-characterized early functions) may have very easily escaped detection in other organisms, because straight knockout approaches will only reveal the early function of a gene in the lineage. Only if an early function is not existent, as apparently is the case for sox-2 and hlh-4, will a late function be revealed with relative ease using standard genetic loss of function, i.e., straight knockout approaches (this paper) [69,70].
Defining hlh-4 as a terminal selector of ADL identity sheds additional mechanistic context on previous studies about the feeding state-dependent regulation of a sensory-type GPCR gene, srh-234, in the ADL neuron [35,71]. Focusing on this specific gene, the authors found that the MEF-2 transcription factor, a well-known mediator of neuron activity-dependent processes in many different organisms [72], down-regulates hlh-4-dependent srh-234 expression under starvation conditions. This effect is mediated via a MEF-2 binding site in the srh-dependent. (B) While a HLH-4/HLH-2 complex appears to be a central regulator of ADL identity genes, it operates with distinct cofactor(s) in a target gene-dependent manner. This is inferred from the notion that lin-11 acts in parallel to HLH-4/2 to contribute to the activation of some but not other target genes. (C) Sequential activities of AS-C-type bHLH genes in a sensory neuronproducing lineage. Lineage diagrams and hlh-14 data taken from [12]. Note the difference between early hlh-14 and late hlh-4 function. Transiently expressed hlh-14 controls the decision of neuroblast versus ectodermal (hypodermis/skin) while hlh-4 controls which type of neuronal identity ADL will adopt, through regulation of the ADL-specific molecular signature shown schematically in panel B. AS-C, Achaete-Scute complex; bHLH, basic helix-loop-helix; Da, Daughterless.
https://doi.org/10.1371/journal.pbio.2004979.g008 234 locus that is located next to the HLH-4/HLH-2 binding E-box [35]. Together with our description of a broad role of hlh-4 in controlling the differentiated state of ADL, an intersectional strategy of a "genetically hardwired" identity factor with a condition-dependent factor becomes apparent. Such an intersectional strategy could perhaps be a general strategy to explain the cellular specificity of broadly acting signals that convey environmental or physiological information.
One of the remarkable features of the chemosensory system of C. elegans is the coexpression of multiple sensory receptors of the GPCR family in individual neuron types [44][45][46]. Even though the expression of only about one fifth of C. elegans chemosensory-type GPCRs has been examined so far [46], there are several chemosensory neurons that coexpress several dozens of GPCRs. This tremendous extent of coexpression only applies to a select set of chemosensory neurons, with the most prominent set being the nociceptive ADL, ASH, PHA, and PHB neurons [46]. One could have imagined several scenarios by which such coexpression is controlled. A previous bioinformatic analysis already strongly hinted toward coregulation of coexpressed GPCRs via a common cis-Regulatory motif [45]. However, it is only through the present analysis that we can conclude that a single trans-acting factor instructs, apparently via direct binding to a cis-Regulatory element shared by most if not all coexpressed GPCRs, the enormously broad spectrum of chemosensory capacities of one of these nociceptive neurons, ADL.  Table. ADL expressed genes. Listed are all known ADL expressed genes (as per Wormbase) except genes that are either not clear terminal markers (TFs and RNP) or not neurontype specific (pan-ciliary genes); such genes were part of the FIRE analysis but are not shown here. Bold: training dataset for original FIRE analysis. Green, non-bold: known to be expressed in ADL but not included in the training set for FIRE analysis. Blue, non-bold: gfp fusions generated in this paper. Green shade: conserved in all species that have orthologs; red shade: no motif in ortholog. The E-box motifs of srh-132, srh-186, sri-51, srh-220, sro-1, hlh-2, nlp-7, nlp-10, osm-9, gpa-1, cam-1, and tax-6 sites were also bioinformatically identified in [45]. FIRE, Finding Informative Regulatory Elements; RNP, RNA binding protein; TF, transcription factor. (XLSX) S2 (XLSX) S1 Data. Numerical values for graphs. These datasets provide the numerical values for the graphs shown in Fig 2, Fig 3, Fig 4 and Fig 7. (XLSX)