Identification of DVA Interneuron Regulatory Sequences in Caenorhabditis elegans

Background The identity of each neuron is determined by the expression of a distinct group of genes comprising its terminal gene battery. The regulatory sequences that control the expression of such terminal gene batteries in individual neurons is largely unknown. The existence of a complete genome sequence for C. elegans and draft genomes of other nematodes let us use comparative genomics to identify regulatory sequences directing expression in the DVA interneuron. Methodology/Principal Findings Using phylogenetic comparisons of multiple Caenorhabditis species, we identified conserved non-coding sequences in 3 of 10 genes (fax-1, nmr-1, and twk-16) that direct expression of reporter transgenes in DVA and other neurons. The conserved region and flanking sequences in an 85-bp intronic region of the twk-16 gene directs highly restricted expression in DVA. Mutagenesis of this 85 bp region shows that it has at least four regions. The central 53 bp region contains a 29 bp region that represses expression and a 24 bp region that drives broad neuronal expression. Two short flanking regions restrict expression of the twk-16 gene to DVA. A shared GA-rich motif was identified in three of these genes but had opposite effects on expression when mutated in the nmr-1 and twk-16 DVA regulatory elements. Conclusions/Significance We identified by multi-species conservation regulatory regions within three genes that direct expression in the DVA neuron. We identified four contiguous regions of sequence of the twk-16 gene enhancer with positive and negative effects on expression, which combined to restrict expression to the DVA neuron. For this neuron a single binding site may thus not achieve sufficient specificity for cell specific expression. One of the positive elements, an 8-bp sequence required for expression was identified in silico by sequence comparisons of seven nematode species, demonstrating the potential resolution of expanded multi-species phylogenetic comparisons.


Introduction
Neurons express a largely overlapping set of genes required for their general function as a neuron. The specific identity of each individual neuron, in turn, requires the expression of distinct sets of genes comprising terminal gene batteries [1,2,3]. In a few neurons, the regulatory sequences determining the expression of sets of genes comprising the terminal gene battery have been identified, but most remain obscure. Identifying these regulatory sequences remains a challenging problem due to the complexity of the nervous system [3].
The C. elegans hermaphrodite nervous system is relatively simple, with defined cell lineages and anatomy [4]. The ability to identify neurons by Nomarski optics and to examine cell-specific gene expression by transgenic reporters makes C. elegans useful to investigate pertinent regulatory sequences. A few C. elegans neurons are unpaired, including DVA, an interneuron in the tail required for the worm to integrate mechanosensory information and to sense how its own body bends as it moves [5,6]. The DVA neuron is located in the dorsal rectal ganglia (DRG) between DVB and DVC; each of these three neurons has distinct functions and lineal origins ( Figure 1). Identification of regulatory elements by deletion analysis is unbiased, but laborious. Phylogenetic footprinting has also been used as a shortcut to regulatory motifs [7,8]. These approaches in C. elegans have identified enhancer motifs that direct expression broadly in neurons, in classes of neurons and selectively in individual neurons [9,10,11] [12]. Shared motifs binding the transcription factors (TF's) AST-1 and UNC-3 have been identified for the co-regulation of genes required, respectively, for the expression of dopaminergic or cholinergic neurotransmitter phenotypes [11,13]. These studies lead to the hypothesis that neuron-specific sequence motifs constitute a simple combinatorial code regulating terminal gene expression. For example, deletion analysis of eight genes expressed in the interneuron AIY identified a 16-bp motif regulated by the cooperative binding of CEH-10 and TTX-3 [10]. Similar analyses identified a 12-bp ASE motif bound by CHE-1 [14] and a bipartite A/T rich core consensus sequence was identified in the regulatory regions of chemoreceptor genes expressed in AWB. In contrast to the regulatory motif found in AIY neurons, the AWB motif was not conserved in C. briggsae [15].
There are 941 transcription factors in C. elegans [16,17,18] potentially available for the cis-regulation of only 302 neurons in hermaphrodites [4]. The studies of AIY, ASE and AWB are consistent with the model that C. elegans utilizes neuron-specific regulatory codes for the regulation of the terminal gene battery [10]. A second model would be that neuron-specific gene expression relies on complex modular combinations of positive or negative elements [19]. A third possibility is that both neuron-specific and complex modular elements regulate the terminal gene battery of each neuron. In the latter two models, a broad analysis of regulatory motifs in a terminal gene battery would not usually identify neuron-specific motifs regulating that terminal gene battery.
Here we use comparative genomics to analyze genes expressed in the DVA interneuron of C. elegans. By combining newly sequenced nematode species and phylogenetic footprinting [8,20] we attempted to reduce the experimental work necessary for the identification of regulatory regions. We applied this method to genes identified as being expressed in DVA, but with wider neuronal expression, along with a mutational analysis of the previously described twk-16 enhancer that shows highly restricted expression in DVA [21].
This analysis identified four conserved regions that directed expression in DVA: a 308 bp fragment containing the conserved region (twk-16.cs1) previously identified by Salkoff [21] a 190-bp conserved region (nmr-1.cs2); a 180 bp conserved region (fax-1.cs3) and a 322-bp conserved region (fax-1.cs4; Figure 2A). Examples of the expression seen in DVA and other tail neurons with these regions are shown in Figure 3. The conserved regions of the nmr-1 and fax-1 intergenic regions produced broader neuronal expression in the head and ventral cord in contrast to the restricted expression seen with the twk-16 intronic region.

Phylogenetic Comparisons of twk-16 Genes
The 1.4 kb first intron of twk-16 contains a region conserved between C. elegans and C. briggsae that drives expression in DVA [21]. A four-species, high stringency MUSSA analysis (20 bp window 17 of 20 bp identical) [8] identified twk-16.cs1 and twk-16.cs2 ( Figure 2A). twk-16.cs1 contains a 73-bp conserved region ( Figure 4A), which is contained in the region identified by Salkoff et al. [21]. The  Identification of a Shared GA-rich Motif in fax-1, nmr-1 and twk-16 We compared the conserved, DVA-expressing sequences from fax-1, nmr-1 and twk-16 with MEME [26], seeking a shared single DVA consensus sequence. We included both a smaller 53-bp fragment WT53 of the twk-16 intron, which did not produce restricted expression, as well as a larger 308-bp twk-16 intronic fragment WT300 that restricted expression in the tail to DVA. A GA-rich motif was in all three conserved regions tested from fax-1, nmr-1 and twk-16: four sites in WT300 containing twk-16.cs1, three sites of the 190 bp nmr-1.cs2 fragment and two sites of the 322 bp fragment fax-1.cs4 ( Figure 4B). This motif spanned position 17-26 within the 53 bp of twk-16 intron sequence present in WT53 (see Figure 8B).
In spite of the fact that this motif might arise from simple dinucleotide biases in the C. elegans genome [27], we tested its function in the context of the 190 bp nmr-1.cs2 element (nmr.WT190). DVA expression was significantly reduced when all GA-rich sites were mutated with transitions (C to T and G to A) (nmr.Mut190); Table 1; p,0.0001, Fisher's Exact Test). However, the 100-bp fragment of the 190-bp nmr-1.cs2 that contains all three GA-rich motifs (nmr.WT100) failed to direct expression. Therefore, the GA-rich motif has a positive effect on gene

bp of twk-16 Intron is Sufficient for DVA Specific Expression
Because of the highly restricted pattern of expression directed by constructs containing twk-16.cs1 but with additional flanking sequence, we analyzed this enhancer in more detail. In particular, we made constructs containing the regions identified by phylogenetic footprinting ( Figure 4A) along with varying amounts of flanking sequences ( Figure 5A). These constructs (except WT2000) used the Dpes-10 promoter ( Figure 2B). WT2000 which contains 500 bp of sequence 59 of the first exon, the first exon and the entire 1.4 kb of the first intron was used to produce transgenic lines [21]. WT2000 produced restricted GFP expression in the tail in DVA, along with one cell in the head tentatively identified as the amphid socket cell AMsoR. The expression seen in WT2000 represents expression of the twk-16 enhancer with the 59 wild-type regulatory regions; the expression is cytoplasmic because the GFP reporter lacks nuclear localization signals ( Figure 6A). Constructs containing the first 2 kb of sequence 59 to the first exon of twk-16 without the first intron enhancer did not produce detectable expression.
A second conserved region (twk-16.cs2) is located 250 bp 39 of the first conserved region in the first intron ( Figure 2A). A construct containing both the twk-16.cs1and twk-16.cs2 regions of the first intron (WT700) produced the same expression pattern as constructs containing twk-16.cs1 but twk-16.cs2 alone (WT350) directed expression nowhere in the animal (Table 2). Therefore, only the twk-16.cs1 region possesses all of the elements necessary to produce restricted expression in DVA. This region can confer DVA expression at a distance, consistent with its prior characterization as an enhancer [21]. The WT500 construct sometimes expressed in both DVA and DVC ( Figure 6B). Constructs of 308 bp (WT300) and 195 bp (WT195) produced the qualitatively brightest YFP expression in DVA ( Figure 6C, 6D). The smallest fragment that could produce expression restricted to DVA was the 85 bp WT85 ( Figure 6F, 6G). A 53 bp subfragment (WT53) produced qualitatively dimmer expression in DVA and broadly in other neurons ( Figure 6H).  This smaller 53 bp fragment of the twk-16.cs1 (WT53) showed YFP expression in DVA but also in other neurons in the tail, ventral cord and head ( Table 2). WT53 also consistently expressed in the RID neuron in the head located in the dorsal pharyngeal ganglion. RID is not known to express twk-16, but is known to have direct reciprocal axonal connections to the DVA neuron. The conserved sequences in WT53 thus can direct expression in neurons both in the tail and elsewhere, but are not sufficient to restrict expression to DVA. When this same 53 bp of sequence (WT53) was placed in reverse orientation (WT53R) to the transcription cassette, there was dim expression in head and some tail neurons but no expression in DVA ( Figure 6I, Table 2). Thus, WT53 may lack sequences conferring orientation-independence on the native twk-16 enhancer.

The Central 53 bp of the 85 bp WT85 Contains Positive and Negative Elements
To analyze the regions within WT53 responsible for DVA and broad neuronal expression we mutated nucleotides predicted by conservation and MEME to be required for expression in DVA ( Figure 5B). Table 3 shows the percentage of animals expressing YFP in different regions of the nervous system in constructs in which sites identified computationally were mutated with transitions (C to T and G to A). Mutation of the predicted site (Mut5; Figure 5B) caused this element to direct broad expression in neurons in the head, ventral cord (VC), pre-anal ganglion (PA) and lumbar ganglion (LG) and DVA (Table 3; Figure 6M). Mutations within the first 29 bp (WT29) of WT53 promoted transcriptional activity, showing qualitatively brighter and broader patterns of expression than WT53. Mutant 2 (Mut2) increased the frequency of expression in neurons in head, VC, PA, LG and DVA compared to WT 53 (p,0.0001) ( Table 3). There was also ectopic expression in intestinal cells and in the vulva ( Figure 6N-O). Mutation 3 (Mut3) drove expression in the same pattern and frequency as WT53 ( Figure 6P) except some Mut3 lines showed expression in hypodermal cells. The mutations in Mut5 and Mut2 in WT53 produced the most consistent neuronal expression (p,0.0001) (Table 3; Figure 6M-N), consistent with sequences in the Mut5 and Mut2 regions acting to repress expression.
Since the Mut3 mutations did not change the frequency of DVA expression or the pattern of expression, we split WT53 into two fragments (the 59 29 bp, WT29, and 39 24 bp, WT24; Figure 5B). WT29 showed no expression in DVA and either no or barely detectable expression in other neurons (Table 2; Figure 6J). By contrast, WT24 drove expression in DVA, tail neurons, PA, VC and multiple head neurons, including RID (Table 2; Figure 6K). The pattern of expression was similar to that seen with the 53-bp fragment (WT53), but occurred in a higher percentage of animals (p,0.0001) and was qualitatively brighter ( Figure 6K). This 24 bp (WT24) region contains sequences required for both broad neuronal expression and DVA expression. Mutation 6 (Mut6) showed expression in the head neurons, but reduced expression in VC, PA and LG, and abolished expression in any of the DRG neurons including DVA (Table 3; Figure 6Q). Mutation 8 (Mut8) largely abolished expression in all neurons and cells in all lines, except for a few (3%) animals showing expression in head neurons (Table 3; Figure 6R). When the WT24 element was mutated in the context of larger fragments (Mut195 and Mut85) that showed highly restricted expression to DVA expression, all expression was abolished in DVA and all neurons ( Figure 7A).

Short Flanking Sequences Restrict the Expression of WT53 to DVA
The initial phylogenetic comparison of the twk-16 gene identified a 73 bp conserved region, which failed to drive expression, but that contains a 53 bp region that does drive expression in DVA and broad neuronal expression. Subsequent deletion analysis identified short regions (17 bp 59 and 15 bp 39) flanking the central 53 bp fragment that restricted expression to the DVA neuron. Specifically, mutation of either the 59 17 bp of the WT85 (59Mut WT85) or 39 15 bp of WT85 (39Mut WT85) abolished all expression in all lines examined ( Figure 7B; Table 3). Expression was also abolished with mutations of both the 17 bp 59 and 15 bp 39 regions of WT85 (59 39 Mut85; Figure 7B; Table 3). These flanking sequences are contained in the WT85 construct but were not identified by MUSSA using stringent parameters in the phylogenetic comparisons of four nematodes. The WT85 element produced consistent expression restricted to DVA with some lines showing faint expression in a few head neurons. These small flanking regions are thus required for expression in DVA and other neurons, and paradoxically, the restriction of expression to DVA.

Seven-species Comparison Identifies 8 bp Required for Neuronal Expression
While WT53 is almost invariant in five Caenorhabditis species, a vast diversity of nematode species exist outside the Elegans group [28,29], among which might exist versions of WT53 with recognizable but significant divergence from C. elegans twk-16. To test this idea, we identified twk-16 orthologs in the newly sequenced genomes of Caenorhabditis angaria (PS1010; Can-twk-16); [30], Pristionchus pacificus (Ppa-twk-16; [29], and Heterorhabditis bacteriophora (Hba-twk-16; [31,32] (X. Bai, B J Adams, TA Ciche, S Clifton, R Gaugler, K Kim, J Spieth, P W Sternberg, R K in the analysis. The respective genes and conserved regions used in the MEME analysis: fax-1.cs3 (180 bp), fax-1.cs4 (322 bp), nmr-1.cs2 (190 bp) and twk-16.cs1 (308 bp). The respective fragments contained 3 GA-rich motifs in fax-1.cs3, 4 GA-rich motifs in fax-1.cs4, 3 GA-rich motifs in nmr-1.cs2 and 3 GA-rich motifs in the 308 bp fragment containing twk-16.cs1. The 144 base start of the GA-rich motif in 308 bp twk-16.cs1 fragment corresponds to position 17 in the 53 bp WT53 and is shown in Figure 8B. The strand, start site, p-value and sequences were identified by MEME. doi:10.1371/journal.pone.0054971.g004 Wilson and P S Grewal, in preparation). We then searched their non-coding DNA with MUSSA for matches to the larger 308 bp intronic fragment WT300. In Ppa-twk-16, we found only one match in a minor intron to a functionally uncharacterized segment of C. elegans WT300. In contrast, the 59 flanks of both Can-twk-16 and Hba-twk-16 each showed two strong matches to the ends of WT53. For the twk-16 genes of seven nematode species, a single region of WT53 similarity showed consistent, transitive ungapped   Figure 8A). These matches had the same orientation towards twk-16 as in C. elegans and correlated strikingly with residues required for WT53 function in vivo (Mut8; Figure 6R). Mutation of these highly conserved bases in Mut8 of WT53 completely abolished expression in DVA and all other neurons and non-neuronal cells (Table 3; Figure 6R).

Discussion
Some C. elegans neurons use neuron specific motifs to co-regulate neuron specific gene expression, as evidenced by analysis of AIY, ASE and AWB [9,11,14,15,33]. We tested whether expanded phylogenetic comparisons could reduce the experimental work required to identify regulatory regions and identify a shared cisregulatory motif that resulted in the selective expression of genes in the DVA neuron. Phylogenetic comparisons of three or four nematode species did identify conserved regions at a comparable 66% (8/12) identification rate to the ceh-13/lin-39 Hox locus (77%) [8]. In our analyses, there was a lower identification rate of 33% (4/12) for conserved regions that produced expression in DVA, consistent with the modular nature of regulatory regions and the evolutionary divergence of regulatory regions with increasing evolutionary distance [34]. However, phylogenetic footprinting of the twk-16 genes from seven nematode species identified a highly conserved 8 bp that is necessary for expression in DVA and other neurons, suggesting that expanded phylogenetic comparisons are useful.
Our results also suggest limitations of phylogenetic comparisons. Even when using stringent parameters, 6 of 10 genes contained more than four conserved regions, a degree of conservation that does not substantially reduce the experimental work of testing regulatory regions. Expanding the number of species might help [34]. Highly conserved non-coding regions often have no positive effect on the particular aspect of transcription under study. A final shortcoming of phylogenetic comparisons is illustrated by the  Summary of expression in transgenic C. elegans lines. Lines are denoted as wild type (WT) followed by a number with approximate size (bp) of the twk-16 experimental sequence. The total number of animals scored is in parentheses with YFP expressing animals shown as a percentage of the total under the corresponding regions of the nervous system. All constructs (except plasmid WT2000) were made by PCR fusion with the expression vector shown in Figure 2B. The constructs used to generate the transgenic lines were: WT2000 with 500 bp 59 of exon 1, exon 1 and the 1.4-kb first intron. The experimental sequences used were the following constructs ( Figure   Summary of expression of wild-type and mutated twk-16 constructs in transgenic C. elegans lines. The total number of animals scored is in parentheses with YFP expressing animals shown as a percentage of the total under the corresponding regions of the nervous system. Lines were produced with PCR fusion constructs with the expression vector shown in Figure 2B. Lines are denoted as wild-type (WT) or mutated sequences responsible for restricted expression in DVA in the 59 and 39 ends of the 85 bp region. These sequences were not identified by our phylogenetic comparison because of the stringent parameters used in our initial comparison. Finding adjacent nonconserved regulatory sequences is consistent with our prior study of the ceh-13/lin-39 Hox locus of C. elegans, where regulatory sequences were near, but not within, blocks of highly conserved DNA sequence [8]. This is consistent with the observation that the relative positions can be weakly conserved across species or diverge sufficiently to not be identified when using stringent parameters to reduce false positives [34].

A Model for the twk-16 Enhancer
The 85 bp twk-16 DVA enhancer contains at least four regions with both positive and negative effects on gene expression ( Figure 8B Figure 7B). The model does not explain this discordance, suggesting that sequence specific and cell specific context dependence mediate the divergent effects of the A and D regions on neuronal expression. An alternative explanation for this discordance is that the A and D regions are required for expression in the context of the 85 bp element but not required in smaller fragments (WT53 and WT24) for both DVA and broad expression. In either case, the 85 bp element has multiple positive and negative acting sites that together can direct appropriate expression.

Potential Transcription Factor Binding Sites in the twk-16 Enhancer
We used the multi-species UniPROBE dataset of transcription factor binding sites [35] (http://the_brain.bwh.harvard.edu/ UniPROBE) and cis-regulatory motifs archived in WormBase [22] to search for potential transcription factor binding sites within the twk-16 enhancer. UniPROBE analysis predicted four potentially interesting binding sites in the 85 bp twk-16 intronic region ( Figure 8B; Figure S2). One predicted homeodomain binding site is within the 17 bp, positively acting A region. The negativelyacting B region has a predicted binding site for mouse Pbx-1, a homeodomain-containing transcription factor [36]. C. elegans ceh-20, is orthologous to the Drosophila HOX co-factor Extradenticle (Exd/Pxd), known to function as co-factor for homeodomain transcription factors [37]. ceh-20 is expressed in many and possibly all neurons and thus could co-operatively repress broad neuronal expression of the twk-16 enhancer. A binding site for an ETS family TF is predicted in region B directly overlapping the GArich motif. These sequences appear to repress the positive regulatory sequences in region C. A second homeodomain binding site is predicted in region C, necessary for expression in all neurons. This homeodomain binding site also overlaps the highly conserved 8 bp region identified by the phylogenetic comparison ( Figure 8B). UniPROBE predicted many homeodomain transcription factor binding sites at this site ( Figure S2), including vertebrate Alx3, Dlx2, Lhx2, Lbx2 and Hlxb9. The respective C. elegans orthologs include a group of homeodomain-containing transcription factors previously identified as being involved in the regulation of gene expression in the AIY neuron, including ttx-3 and ceh-10 [38]. The B and C regions in combination consistently drive expression in the RID neuron, and ceh-10 is expressed in RID [39]. Additional C. elegans homeodomain transcription factors in this group include ceh-14 and lin-11, both expressed in the lumbar ganglion. Consistent expression of WT53 is seen in the lumbar ganglion and this is attributed to the loss of flanking elements in regions A and D. There is a second predicted ETS binding site overlying the junction of the C and D region. Mutation of this site within the D region also abolishes all neuronal expression.

Cis-regulation of the Terminal Gene Battery in DVA
The cis-regulatory mechanisms identified in the C. elegans neurons AIY, ASE and AWB support the model that single elements binding about two transcription factors regulate terminal gene batteries; this type of regulatory logic has been shown in multiple species [40]. Our finding of the relatively more complex structure of the twk-16 DVA enhancer DVA gene expression is not consistent with this model. Of course this is only one gene of the likely multiple subsets of DVA expressed genes and the logic might depend on individual genes or batteries.

Strain Handling
C. elegans strains were handled and maintained following standard protocols and experiments were conducted at 20uC [41].

Bioinformatics and Genome Comparisons
The genomic sequence of Heterorhabditis bacteriophora was generously provided before publication by the Genome Sequencing Center of Washington University (X. Bai et al., manuscript in preparation). All other genomic sequences, protein sequences, and genomic coordinates of twk-16 orthologs were from the WS200 release of WormBase or from our published data (C. angaria) [30]. The coordinates of WT300 and WT53 elements in Caenorhabditis genomes (Table S2) were determined by MUSSA comparisons to C. elegans followed by BlastN against reference genomes [42]. Motifs were predicted by MEME run on the UCSD web server (http://meme.ncbr.net; [26]; ungapped blocks of similarity were detected by MUSSA run locally [8]. To exclude them from MUSSA comparisons to WT300 or WT53, exons of twk-16 orthologs (or their neighbors, where applicable) were masked as 'N' residues with Perl.
MUSSA [8] was used to identify evolutionarily conserved sequences. MUSSA uses N-way transitivity (all-against-all) so that only windows passing the selected similarity threshold across all species are reported as alignments. The MEME Web server was used to identify nonaligned motifs shared by different sequences [44]. Possible instances of WormBase motifs in WT53 were detected with FIMO [45].

Transgene Design and Construction
Unless otherwise noted, the transcriptional reporter gene constructs contained the test sequence 59 to the minimal Dpes-10 promoter in Fire laboratory vector pPD122.53 [46] modified to contain YFP rather than GFP. These constructs were then fused by PCR to a second PCR construct [47] derived from pDPMM051, containing the 59 non-coding region and an unc-119 minigene [48]. The final construct utilized for ballistics was: Experimental Sequence:: Dpes-10::4X NLS::YFP::unc-54:: unc-119 ( Figure 2B). Wild-type (WT) constructs #73 bp in size were synthesized as oligonucleotides and ligated to the vector. We mutated conserved sequences by synthesis with the substituted bases at the designated sites along with the 59 Dpes-10 anchor sequence. Mutated sequences produced by oligonucleotide annealing and PCR fusion were sequenced to determine if the correct product was produced. The mutations of the GA-rich motif were produced by PCR fusion of a mini-gene (Integrated DNA Technologies, Coralville IA; IDT) derived PCR product to the above reporter vector. The minigenes were sequenced by the supplier (IDT).

Transgenesis
C. elegans strain PS3460 [unc-119(ed4)] was transformed with transgenic constructs by micro-particle bombardment using the PDS-1000/He Biolistic system (Bio-Rad). For a detailed protocol contact the authors. Briefly, nematodes were grown with HB101 in S-complete synchronously in liquid culture from bleached eggs for 72 hours at 20 degrees to L4-early adult stages. They were then used for ballistics and were recovered in liquid culture for two days. Following the two-day recovery period, worms were concentrated and rimmed onto 10 cm plates with OP50 lawns to identify non-Uncoordinated transgenic larvae by their ability to emerge and crawl onto the bacterial lawn. Independent transgenic lines were maintained and examined for each reporter construct.

Scoring of Transgenic Animals
Expression was scored by Nomarski optics and YFP expression on a Zeiss Axioplan microscope. Photographs were taken with a digital camera at 100x using Improvision Openlab software. Lines to be scored were selected by high frequency transmission of the non-Uncoordinated phenotype, and the presence of either visible expression by low power epifluorescence microscopy or no expression in all lines. Scoring was limited to regions, i.e. head or tail or regions or anatomically defined regions such as ventral cord (VC), pre-anal ganglion (PA), lumbar ganglion (L) or the dorsal rectal ganglion (DRG). The only neuron individually scored was DVA. Most transgenic lines produced by bombardment show a consistent pattern of expression between animals within each independent line. However, there were significant differences in YFP brightness qualitatively between lines produced with the same construct. We scored from three to ten animals from each independent line for each construct and scored from two to ten lines with an average of three lines (Table S1). In a few cases only two lines were generated but included in the data if attempts to get additional lines was unproductive. Expression in DRG was scored as positive if we saw expression in DVA or if we could see expression in DRG but not definitely identify the cell as DVA or the other two cells in the DRG, DVB or DVC. Statistical analysis of differences in the frequency of expression in DVA was performed when there were not clear differences between lines. Expression in DVA for different constructs was compared by Fisher's Exact Test. with exons in either blue or pink with non-coding regions as black lines. The analyses included the 59 intergenic regions of C. elegans genes acr-15 (1.6 kb), fax-1 (5.7 kb), nmr-1 (1.1 kb) and twk-16 first intron (1.4 kb). These non-coding regions were compared to the corresponding orthologous genes of C. briggsae (CBG) and C. remanei (CR) using a window of 20 and threshold of 17 ungapped identities (85% match) and shown as red lines between the orthologs. (TIF) Figure S2 Uniprobe analysis of WT85. WT85 was analyzed against all species TF's in the Uniprobe database http:// the_brain.bwh.harvard.edu/uniprobe). Predicted Homeodomain transcription factor binding sites (Homeodomain TF's) and ETS family transcription factors binding sites (ETS Domain TF's) are above the predicted binding sites for the TF's represented by multiple colored lines, which correspond to the TF's listed in the column. (TIF) Table S2 Summary of transgenic lines. The lines are listed by alphabetical name followed by number of that specific line, date scored, number of animals scored and neuronal expressio by region or ganglion. The only neuron scored individually was DVA. Scoring of animals was done as in Table. (XLSX)