A Naturally Occurring Polymorphism at Drosophila melanogaster Lim3 Locus, a Homolog of Human LHX3/4, Affects Lim3 Transcription and Fly Lifespan

Lim3 encodes an RNA polymerase II transcription factor with a key role in neuron specification. It was also identified as a candidate gene that affects lifespan. These pleiotropic effects indicate the fundamental significance of the potential interplay between neural development and lifespan control. The goal of this study was to analyze the causal relationships between Lim3 structural variations, and gene expression and lifespan changes, and to provide insights into regulatory pathways controlling lifespan. Fifty substitution lines containing second chromosomes from a Drosophila natural population were used to analyze the association between lifespan and sequence variation in the 5′-regulatory region, and first exon and intron of Lim3A, in which we discovered multiple transcription start sites (TSS). The core and proximal promoter organization for Lim3A and a previously unknown mRNA named Lim3C were described. A haplotype of two markers in the Lim3A regulatory region was significantly associated with variation in lifespan. We propose that polymorphisms in the regulatory region affect gene transcription, and consequently lifespan. Indeed, five polymorphic markers located within 380 to 680 bp of the Lim3A major TSS, including two markers associated with lifespan variation, were significantly associated with the level of Lim3A transcript, as evaluated by real time RT-PCR in embryos, adult heads, and testes. A naturally occurring polymorphism caused a six-fold change in gene transcription and a 25% change in lifespan. Markers associated with long lifespan and intermediate Lim3A transcription were present in the population at high frequencies. We hypothesize that polymorphic markers associated with Lim3A expression are located within the binding sites for proteins that regulate gene function, and provide general rather than tissue-specific regulation of transcription, and that intermediate levels of Lim3A expression confer a selective advantage and longer lifespan.


Introduction
Lifespan is determined by a complex interplay between environmental and genetic factors. Temperature, air pollution, nutrition, and other factors affect multiple processes through various signaling and metabolic pathways. Many genes are involved in these pathways, and therefore control lifespan. Indeed, hundreds of genes are known to affect lifespan in model organisms [1][2][3]. However, many aspects of the genetic control of lifespan remain unclear. One that is especially interesting for us is how naturally occurring structural and functional variations in a gene can affect this phenotypic trait. Recent studies of natural nucleotide divergence in a variety of Drosophila genes demonstrated associations between structural polymorphisms in several genes and quantitative traits, including lifespan [4][5][6]. However, the causal relation of these structural variations and gene expression changes and phenotype alterations remains poorly understood.
Several candidate genes affecting lifespan have been revealed using recombination mapping followed by quantitative complementation tests with deficiencies and mutations at candidate loci [7]. Among others, Lim3 was identified as a candidate gene affecting lifespan [8]. Recent data show that this gene is also associated with locomotion behavior [9].
Lim3 is located in cytological region 37B13-37C1 of the second chromosome, and is a homeobox gene that encodes an RNA polymerase II transcription factor (TF) required for development and function of neurons. Lim3 is involved in complicated motor neuron specification networks, and is activated by Nkx6 and repressed by Even skipped (Eve) [10]. Lim3 may regulate axon extension and fasciculation through its downstream target, FasciclinIII [11]. With Islet and Drifter, Lim3 constitutes a ''combinatorial code'' that generates distinct motor neuron identities [10,12]. The Lim3 protein contains two LIM domains, a carboxyterminal homeodomain, and a highly conserved 22-amino acid region called the Lim3-specific domain (LSD). Lim3 is highly homologous to the vertebrate LHX3/4 subclass of LIMhomeodomain proteins, with 95% and 98% identity to human LHX3 and LHX4 in the homeodomain region, 89% identity in the LIM domains, and 45% identity in the LSD [13]. Like Lim3, human LHX3/4 are TFs required for pituitary development and motor neuron specification. Mutations in LHX3/4 are associated with combined pituitary hormone deficiency, rigid cervical spine, or short stature [14][15][16].
The involvement of Lim3 in both the regulation of neuron development and lifespan control could be of fundamental significance. The effect of Drosophila Lim3 on lifespan control could be conserved in multicellular eucaryotes, including humans, similar to its role in neuron identification. Analysis of the causal relationships between Lim3 structure, transcription level, and lifespan will provide insight into conserved regulatory pathways controlling lifespan. In this paper, we demonstrate the potential of naturally occurring polymorphisms in the Lim3 59-regulatory region to modulate gene expression and fly lifespan.

Results
The exact mechanisms of Lim3A transcription, and the structure of its potential regulatory region were unknown. To characterize and evaluate the functional role of naturally occurring polymorphisms of the Lim3 59-regulatory region, we first analyzed initiation of Lim3A transcription and determined the exact border between the regulatory and structural parts of the gene, and outlined proximal promoter region and potential binding sites for regulatory proteins within the regulatory region.

Analysis of Lim3A transcription initiation and proximal promoter region
Lim3 was found to produce two mRNAs: Lim3A and Lim3B (Gen Bank accession nos. NM_057258 and NM_165277), with the same structure, except that the first exon of Lim3A is replaced by two different exons in Lim3B (Figure 1). We focused on Lim3A, which has been shown to have a function in Drosophila neuron development [10].
Northern blot using a Lim3A-specific probe revealed two transcripts ( Figure 2A). The larger 2.6 kb major transcript was identical in size to Lim3A; the minor 2.4 kb transcript, which we called Lim3C, was new. 59-RACE analysis ( Figure 2B) confirmed the additional Lim3C mRNA. Sequences of 47 clones obtained by 59-RACE (GenBank accession no. GU814523-GU814569) demonstrated that each transcript had an array of closely located transcriptional start sites (TSSs) with different initiation rates. The major Lim3A TSS ( Figure 3) was at 26 nucleotides (18 clones), and the minor TSSs were at 216 (3 clones), 22 (8 clones), and +14 (4 clones) relative to the earlier annotated start site. The major Lim3C TSS ( Figure 3) was at +184 (8 clones), and the minor TSSs were at +169 (3 clones), and +179 (3 clones) relative to the earlier annotated start site. TSSs located downstream of the major TSS might correspond to accidentally truncated fragments of fulllength RNA molecules, so only TSSs represented by three or more clones were considered. Lim3C appeared to be 190 bp shorter than Lim3A because of the reduced length of the untranslated region (UTR). Seven identical exons were present in both transcripts (data not shown).
Lim3A and Lim3C TSSs were located within the large 12-kb intron of Lim3B ( Figure 3). Several bioinformatic resources were used to determine regulatory elements present in the core and proximal promoter regions of Lim3A and Lim3C. The Lim3A transcript start region (initiator) was a close match to the consensus sequence of the D. melanogaster initiator T-C-A +1 -G/T-T-T/C [17,18], and appeared to be TTA +1 GTC. Almost identical initiators were found in 13.2% of genes in the Drosophila Core Promoter Database. Most of these (63%) contained downstream core promoter elements (DPEs), and mainly had similar functions, specifically RNA polymerase II TF activity, which correlated with the Lim3 function.
The Eukaryotic Promoter Database, Current Release 100, and Drosophila Core Promoter Database were used to detect Drosophila core promoter elements in the Lim3A regulatory region. A DPE was identified at +28 to +33 nucleotides relative to the major TSS of Lim3A ( Figure 3). The DPE sequence, AGTTGC, was a reasonable match to the consensus DPE sequence (A/G/T +28 -C/G-A/T-C/ T-A/C/G-C/T) [17,18], and was encountered in 0.8% of 1926 genes included in the Eukaryotic Promoter Database.
No TATA box was found in the Lim3A regulatory region. However, the sequence CAATAA, found at 227 to 222 nucleotides upstream of the Lim3A TSS, often occurs at positions from 236 to 221 nucleotides in the regulatory regions of D. melanogaster genes (0.6% of 1926 promoter sequences in the Eukaryotic Promoter Database). For example, CAATAA was found in the regulatory regions of five Enhancer of split [E(spl)] genes (HLHm3, HLHm5, HLHm8, HLHmb, HLHmc) [19], which encode basic helix-loop-helix transcriptional repressors that are expressed mainly during the embryonic stage, and function in neuronal development, similar to Lim3. Thus, the CAATAA sequence is common to genes with overlapping expression patterns during embryogenesis [20].
In contrast to Lim3A, the Lim3C transcription start region TTG +1 AGC was less similar to the consensus. Promoter   In addition to the two major and some minor TSSs mentioned above, 59-RACE analysis revealed TSSs represented by a single clone each, located approximately 250 bp upstream of the Lim3A major TSS. These rare long transcripts might use promoters predicted by the Neural Network Promoter Prediction database ( Figure 3) at 2534 to 2485 (score 0.98), and 2242 to 2193 (score 1.00), relative to the Lim3A major TSS or, more likely, are ''slippery promoters'' typical of both TATA-containing and TATA-less Drosophila genes with multiple TSSs [21].
To identify potential TF-binding sites within the proximal regulatory regions of Lim3A and Lim3C, TFSEARCH version 1.3, MOTIF Search, and other bioinformatic resources (see Materials and Methods) were used. Potential TF binding sites were found for heat-shock factor, which also controls the expression of non-heat shock protein genes, for example, eve, in Drosophila embryonic development [22]; Hunchback (HB) which is necessary and sufficient for specifying early-born temporal identity in multiple neuroblast lineages [23]; and broad-complex Z3 and broadcomplex Z4 (BR-C Z3/ Z4), which are essential for metamorphic reorganization of the central nervous system [24] (Figure 3). HB and BR-C Z3/Z4 are specialized TFs participating in Drosophila nervous system morphogenesis that might take part in Lim3 transcription regulation, which is also essential for neuron development.

Naturally occurring polymorphisms at Lim3
To determine if Lim3 function is associated with molecular variation in natural populations of Drosophila, we sequenced 2094 bp from 50 alleles from the Raleigh natural population, including 1557 bp of the Lim3A regulatory region, 300 bp of the 59 UTR, 109 bp of the translated region from the first Lim3A exon, and 128 bp from the first intron ( In total, 90 polymorphic markers were found, including 74 single nucleotide polymorphisms (SNPs) and 16 insertions and deletions (indels). Estimates of nucleotide diversity based on the number of differences between pairs of sites (p, [25]) and the number of segregating sites (h, [26]) were within the range observed for D. melanogaster [27]: p = 0.0070960.00046 and h = 0.0086460.00098. The highest level of variation was in the intron. However, when the regulatory region was divided arbitrarily into two equal parts, the distal section had approximately the same level of variation as the intron, while the proximal section closest to the 59 UTR was much more conserved (Table 1). Not surprisingly, the most conserved was the translated part of the exon (Table 1), where only two nonsynonymous substitutions were found, each with a frequency of 0.02. Little significant linkage disequilibrium (LD) was observed between polymorphic markers (Figure 4), and the pattern of linked loci was as expected under assumptions of normal recombination, with few exceptions. This result was favorable for the identification of casual associations between molecular and phenotypic variations.
Molecular population genetic tests for selection were used to determine whether evolutionary forces might be regulating nucleotide variation at Lim3 locus. Significant negative values for D [28], D* and F* [29] were observed for the first exon (Table 2). Most parameters were also significant for the 59 UTR alone, and for the translated region of the exon alone (Table 2). Only two nonsynonymous polymorphisms were found in our sample, so other neutrality tests were not applied. Overall, our results indicated less variation in the Lim3A first exon than expected under neutral expectations, and the action of purifying selection on this region. To understand in more detail the biological significance of molecular variation observed in the Raleigh natural population, we tested effects of nucleotide diversity on gene expression and fly phenotype.

Association between molecular variation at Lim3 locus and lifespan
Association studies used 44 polymorphic markers that were present in our sample at a frequency of 0.06 (in three lines out of 50) and higher. This restriction allowed us to concentrate on polymorphisms that were truly segregating in nature. Lifespan measurements were published in [5,6].
Analysis of variance (ANOVA) revealed six polymorphic markers significantly associated with lifespan, while no significant association with lifespan was seen for sex or marker by sex interaction. Based on these results and the restricted sample size of sequenced alleles, we combined data on the sexes for nonparametric, distribution-free Wilcoxon tests, to assess association between molecular variation at Lim3 and lifespan. The same six markers showed significant association with lifespan ( Figure 3, Table 3, Table S1): four were located in the regulatory region (A433T; G871C; A1050G; C1177T), one in the 59 UTR (T1658C), and one in the first intron (G1991A). None were in significant LD with each other. We also checked association of lifespan with several haplotypes composed of combinations of significant markers from the regulatory region that were most likely to influence lifespan through transcription alteration. Haplotypes composed of four makers, A433T, G871C, A1050G, and C1177T, three proximal markers adjacent to the structural gene, G871C, A1050G, and C1177T, and two markers with minimal P-values for individual association with lifespan, G871C and C1177T, were significantly associated with lifespan ( Table 3, Table S1). In total, we carried out 47 association tests. Only lifespan association with haplotype G871C+C1177T survived Bonferroni correction, and lifespan associations with two other haplotypes and the G1991A marker survived a less conservative false discovery rate (FDR) correction. We concluded that the combination of two markers in the regulatory region, G871C+C1177T, which were present in all haplotypes, and the single marker in the first intron of Lim3A were important for lifespan.
One of the alleles at each polymorphic site composing the significant haplotype had a low population frequency (p C = 0.08 for C871G; p T = 0.06 for C1177T), and was associated with short lifespan (Table 3). Of four possible combinations of alleles, only three were present in the population. Their frequencies were in good agreement with those expected from the frequencies of single alleles (x 2 = 0.0055), which confirmed the absence of LD between the markers. Multiple comparisons of means allowed us to divide the GC, CC, and GT haplotype variants of the G871C and C1177T markers into two groups that significantly differed in lifespan (P,0.05). The first group included 86% of lines and was characterized by the GC haplotype and a mean lifespan of 38 (61) days. The second group included lines with the rare CC (8%) and GT (6%) haplotype, and mean lifespans of 31 (62) and 29 (62) days.
We proposed that polymorphisms in the regulatory region of the gene affect its expression, and thus a phenotypic trait such as lifespan. Our next goal was to test this hypothesis experimentally.

Association between molecular variation in Lim3A regulatory region and Lim3A expression
Lim3A and Lim3C differ in their 59 UTR region, with Lim3C shorter by 190 bp. Therefore, the amount of either Lim3A alone, or both transcripts could be detected and measured. As Lim3A was more abundant (Figure 2A), and has functional significance for neuron development [10], we focused our analysis on Lim3A. To assess association between molecular variation in the Lim3A regulatory region and its transcript level, 16 lines with different G871C and C1177T haplotypes were selected. According to the information available [http://flyatlas.org, accession no. FBgn0002023], Lim3 transcription is predominantly observed in embryos, and in adult brains and testes. Guided by this information, we evaluated the amount of Lim3A in embryos, heads (Table 4), and testes of selected lines using real time RT-PCR.
Correlations between independent measurements of Lim3A transcripts were highly significant across the 16 lines in both embryos (P,0.0001) and in heads (P = 0.0074), strengthening reliability of the results. The correlation between independent measurements in the testes was not significant (P = 0.1082), probably because of the substantially smaller amount of detected Lim3A mRNA. The amount of Lim3A mRNA was also correlated in embryos and heads (P = 0.0064), in embryos and testes (P = 0.0579), and in heads and testes (P = 0.0204) across the 16 lines.  In total, 30 of the 44 markers segregated in these lines, and eight were in complete LD with the others: 24 association tests with 22 markers and two haplotypes were performed. According to the distribution-free Wilcoxon test, significant association was seen between Lim3A levels in embryos for 14 polymorphic markers. For four markers (G871C, A926G, G1021A, C1046A), this held after Bonferroni correction, and another four (A433T, G586A, G598T, C1177T) held after FDR correction ( Table 3, Table S1). Markers G1021A and C1046A, G586A and G598T were in complete LD in the 16 lines. Another method [30] based on the analysis of direct C(t) measurements proportional to the logarithm of the substrate quantity was used for verification. Significant associations surviving FDR corrections were confirmed for G871C and C1177T (Table 3, Table S1). Finally, REST [31], a program that accounts for different PCR efficiencies for target and reference genes, confirmed associations of G871C and C1177T (P = 0.0001 for both).
G871C and C1177T are the two polymorphic markers that form the haplotype that is significantly associated with lifespan. Association with Lim3A levels in embryos was highly significant for this haplotype, by all methods of analysis (Table 3), including pairwise comparisons using REST software (P = 0.0001 for each comparison). Multiple comparisons of means allowed us to categorize lines with different haplotype variants of the G871C and C1177T markers, specifically CC, GC, and GT, into three groups with an approximately six-fold significant difference (P,0.05) in the amount of Lim3A in embryos (CC: 1.860.27; GC: 0.760.06; GT: 0.360.04; Figure 5).  According to the Wilcoxon test, significant associations were found for Lim3A levels in adult heads for five polymorphic markers. One marker (C1177T) survived Bonferroni correction and another two (G871C, A926G) survived FDR correction ( Table 3, Table S1). Analysis of direct C(t) measurements revealed significant associations surviving FDR correction for A926G and C1177T (Table 3, Table S1). These results were not confirmed using REST. Association with Lim3A levels in adult heads was highly significant for the G871C+C1177T haplotype by both nonparametric analysis methods (Table 3), and only one of the three pair-wise comparisons was significant (REST, P = 0.023 for GT compared to CC). Multiple comparisons of means allowed us to categorize lines with the CC, GC, and GT haplotype variants of the G871C and C1177T markers, into three groups with approximately six-fold significant differences (P,0.05) in Lim3A levels in adult heads (CC: 0.760.12; GC: 0.460.04; GT: 0.160.02; Figure 5).
According to the Wilcoxon test, significant associations were seen between the amount of Lim3A in testes and 16 polymorphic markers, although none survived Bonferroni or FDR correction ( Table 3, Table S1). Association was also significant for the G871C+C1177T haplotype, according to both nonparametric analysis methods, but these also did not survive Bonferroni or FDR correction (Table 3). Multiple comparisons of means showed that Lim3A transcription in testes was significantly different (P,0.05) between lines with the CC (0.0460.004) and GT (0.0160.003) haplotype variants of the G871C and C1177T markers ( Figure 5).
Many polymorphic markers appeared to be significantly associated with the amount of Lim3A in different tissues. Different methods of analysis and different P-value corrections gave slightly different, though not contradictory results ( Table 3). The most notable polymorphic markers were G871C and C1177T, which formed a haplotype significantly associated with lifespan (P = 0.0010), and with transcription in embryos (P = 0.0001), adult heads (P = 0.0011), and testes (P = 0.0053). Each of the two markers alone was also significantly associated with transcription in embryos (G871C: P = 0.0002; C1177T: P = 0.0033), adult heads (P = 0.0105, P = 0.0021), and testes (P = 0.0138, P = 0.0121), as well as lifespan (P = 0.0151, P = 0.0084). The polymorphic markers A926G and G1021A+C1046A (linked in the sample of 16 lines), located between G871C and C1177T, were also significantly associated with the Lim3 transcription level in embryos (A926G: P = 0.0021; G1021A+C1046A: P = 0.0018), adult heads (P = 0.0052, P = 0.0356), and testes (P = 0.0209, P = 0.0091), as well as the haplotype composed of all five markers G871C+A926G+ (G1021A+C1046A)+C1177T (P = 0.0005, P = 0.0058, P = 0.0053, for embryos, heads and testes, Table S1). We propose that the entire region from 380 to 686 bp upstream of the Lim3A major TSS is important for gene expression, while only two markers within this region are important for lifespan.
All five polymorphic markers mentioned above are in potential TF-binding sites: G871C and A926G are in the Grainy Head (Grh) binding site consensus sequence, G1021A is in the specificity protein-1 (Sp1)/Krüppel-like factor (KLF) binding-site consensus sequence, C1046A in the Zeste-like motif, and C1177T is in the (CA/TG) 9 repeat (Figure 3).
G871C and C1177T appeared to be the most essential markers for Drosophila Lim3A expression and lifespan. When C, a frequent allele in the Raleigh population, was present at the 1177 position, Lim3A transcription was intermediate and Drosophila lifespan was high. When C was substituted for T, a rare allele, the expression and Drosophila lifespan were low (Table 4). Hence, we suggest that this site normally functions in Lim3 activation, as an activator-binding site. The (CA/TG) 9 repeat where the C1177T polymorphic site is located is a a cis-regulatory element [32], however, nothing is known about the proteins that bind this repeat [33].
In a background of C at the 1177 position, Lim3A transcription and Drosophila lifespan was dependent on the G871C marker (Table 4). When G, a frequent allele in Raleigh population, was present at the 871 position, Lim3A transcription was intermediate and Drosophila lifespan was high. When G was substituted for C, a rare allele, the expression increased, and Drosophila lifespan was short (Table 4). Hence, we suggest that normally this site is involved in Lim3 repression as a repressor binding site. Indeed, Grh, which presumably interacts with G871C as part of its specific binding site, cooperates with Polycomb-group (PcG) proteins that inactivate genes by chromatin remodeling, and Grh-binding sites are often encountered in Polycomb response elements (PREs) [34][35][36].
Both intermediate level of Lim3A expression and longer lifespan are associated with the same polymorphic haplotype. Square regression (with mean lifespan as a dependent variable and Lim3A mRNA amount as independent variable) is a better approximation for our data (R 2 = 0.062 for embryos; R 2 = 0.011 for heads; R 2 = 0.007 for testes) than linear regression (R 2 = 0.0029 for embryos; R 2 = 0.0006 for heads; R 2 = 0.000 for testes). For embryos, square regression is significant (P = 0.0472), indicating that the model accounts for a low but significant portion of variation in the data; linear regression is not significant (P = 0.2814). This result is in agreement with the hypothesis that intermediate levels of Lim3A expression confer longer lifespan.

Discussion
We found that Lim3 produces three mRNAs. In addition to the already known Lim3A and Lim3B transcript, we discovered the additional Lim3C mRNA. The promoter region of Lim3A is DPEcontaining, but lacks a TATA box, and possesses multiple start sites with one major initiation site, and additional nearby minor ones. The distance between the Lim3A DPE and initiator is appropriate for TFIID binding, which is essential for transcription [37]. Reduced expression of Lim3C compared to Lim3A is most likely explained by the lack of a strong initiator, TATA-box, or other core promoter elements. However, other elements such as CAAAAT, and other different mechanisms of initiation may be used to regulate Lim3C transcription. The alternative promoters of Lim3A and Lim3C may provide a mechanism for tissue-and developmental stage-specific Lim3 activation. TATA box-containing promoters are activated after embryonic development, and TATA-less promoters of the same genes are active during early embryo development [38,39]. Mammalian LHX3a and LHX3b, which are homologues of Lim3A and Lim3B, are transcribed from two alternative TATA-less, GC-rich promoters [40], have distinct temporal expression profiles, and have different regulatory roles in the development of the distinct cell types [41].
Statistical analyses demonstrated that the first exon of Lim3A (Lim3C) is affected by purifying selection. The normal recombination found in this region suggests that the selection should be highly effective against deleterious alleles, removing them from the population [42]. Indeed, only two polymorphisms with minimal detectable frequency were found in the translated region of the first exon. Thus, the conserved structure of the Lim3A (Lim3C) protein can be assumed to be essential for its proper function, and therefore maintained by selection. An alternative explanation is that the Raleigh population recently experienced a bottleneck. However, this is not confirmed by analysis of selection forces acting on other regions of the gene, or on other genes whose molecular variation was analyzed using the same sample of second chromosomes from the Raleigh population (Dopa dcarboxilase [5]; Catecholamines Up [6]; shuttle craft, Simonenko, Pasyukova, unpublished results).
Regulatory regions can have crucial roles in evolution, and modifications in these regions have mainly adaptive evolutionary effects [43,44]. Statistical analysis did not reveal any evidence for natural selection in the Lim3A regulatory region. Nevertheless, the significance of the regulatory region for transcription and phenotype was demonstrated by the finding that nucleotide substitutions within this region that segregated in the Raleigh population appeared to result in differences as large as six-fold in gene transcription, and 1.3-to-1.5-fold in lifespan. No significant associations were found between markers located outside the regulatory region, (i.e. in the 59 UTR or the Lim3A structural gene) and Lim3A levels, and markers significantly associated with Lim3A expression were not in LD with each other or with markers within the gene in a sample of 50 alleles. Therefore, we have likely identified actual casual relationships between natural polymorphisms and gene function. Haplotype variants of the G871C and C1177T polymorphic markers associated with short lifespan, and either high or low Lim3A transcription (CC, GT) were found in the Raleigh population at low frequencies. The haplotype variant associated with long lifespan and intermediate Lim3A transcription (GC) was present at high frequency. Thus, association analysis predicted that an intermediate level of Lim3A expression provided longer lifespan, and a selective advantage. Statistical tests were possibly not sensitive enough to detect this selection, however. Even when the fitness effects of mutations are in the nearly neutral range, natural selection is still able to influence transcriptional phenotype [45].
A possible general explanation for the absence of selection on the regulatory region is that nucleotide substitutions in a single, or in several TF binding sites might affect gene expression only in the tissues where these TFs are active, so the impact of the substitutions on phenotypic traits would be small. However, as mentioned above, this was not true for several polymorphisms within the Lim3A 59-regulatory region, which significantly affected expression and phenotype. Moreover, it is difficult to point to polymorphic markers within Lim3A regulatory region which have tissue-specific effects. Rather, most polymorphic markers that were significantly associated with transcript abundance seemed to be important in all tissues, and the exact significance level of the effect depended on the reliability of measurements in a particular tissue and on methods of analysis. Thus, nucleotide substitutions found in the Lim3A regulatory region in the Raleigh natural population must be located within sites that regulate transcription in a general, rather than a tissue-specific manner.
Most polymorphic markers significantly associated with transcription were located in the compact region that was 380-680 bps upstream of the Lim3A major TSS, and were within binding sites for important transcriptional regulators. For example, Grh is involved in many regulatory networks, including the complex regulation of neuroblast specification and neuron apoptosis [46,47]. Sp1 mediates transcription of the LHX3 gene, the human homologue of Lim3 [40]. Grh and Sp1/KLF are members of the PcG and trxG complexes. Binding sites for other members of these complexes (Pho, GAGA or GAF/Psq), were also found in the Lim3A regulatory region, suggesting that PRE-TRE sites for PcG and trxG complexes are present in the region.
We presume that both repressor and activator proteins bind the essential sites for Lim3 transcription and fly lifespan in which the polymorphic markers are located. We hypothesize that the repressor protein Grh and the unknown activator protein that binds the (CA/TG) 9 repeat might provide negative and positive transcriptional regulation of Lim3A, and consequently affect Drosophila lifespan. Disrupting the balance between negative and positive regulation would result in deviations in Lim3A transcription, and a decrease in Drosophila lifespan An intermediate expression based on a balance between activation and repression of the gene and favorable for long lifespan could be provided by the combined activity of PcG and trxG protein complexes through maintenance of a silent or active transcriptional state of their target genes. The PcG and trxG complexes bind to genes encoding transcription factors, including homeodomain-containing proteins such as Lim3, and are implicated in the regulation of various transcriptional pathways [48].
Overexpression or RNAi knockdown of a number of Drosophila genes showed the involvement of these genes in lifespan control [for example, [49][50][51]. Direct proof of Lim3 involvement in lifespan control is required, however, gene overexpression or RNAi knockdown are not applicable in this particular case. We are considering site-specific integration of a Lim3 transgene using carefully chosen sets of landing sites, transgene constructs, and drivers as a possible approach to verify the results presented here. Experimental manipulations with Lim3 expression levels are also necessary to prove that intermediate levels of Lim3A expression confer longer lifespan.
The mechanism underlying Drosophila lifespan variation through alteration of Lim3 expression is not understood. Molecular variation at the Lim3 regulatory region most strongly affected Lim3 expression in embryos. Previously, Lim3 was found to be active in the Drosophila embryonic nervous system and to take part in regulatory networks leading to the specification of motor neuron subclass identity, axon pathfinding, and finally, proper muscle innervation [12]. Lim3 was reported to be expressed in the Drosophila ring gland [10], but later studies failed to confirm Lim3 expression in the embryonic Drosophila endocrine system [52]. Whether these Lim3 functions are sufficient to explain the lifespan variations caused by alterations in Lim3 expression in embryos, and other mechanisms that might explain lifespan effects initiated during early development are unknown. Recently, however, genes responsible for sex determination during early Drosophila development that also affect lifespan were found [53].
The role of Lim3 in adult flies is not known, and we do not possess any information about alterations of Lim3 transcription level with age. Lim3 was first discovered as a male-specific candidate lifespan gene [8]. Thus, Lim3 expression in testes is assumed to affect lifespan and even to have a main casual relation to lifespan variation. However, we failed to find a strong association between Lim3 transcription in testes and lifespan, probably because of insufficient sensitivity in measuring of small amounts of mRNA. We demonstrated that Lim3 is substantially expressed in adult heads. This confirms that Lim3 expression in adults is tissue-specific, and probably associated with the nervous system. Lim3 function in the adult brain may be involved in lifespan regulation. We intend to ascertain the Lim3 function in the nervous and neuroendocrine systems of adult flies, to move closer to understanding the mechanisms underlying Lim3 involvement in Drosophila lifespan control.

Drosophila stocks
We used 50 substitution D. melanogaster lines containing second chromosomes from the Raleigh (USA) population in homozygous Samarkand genetic background and differing in lifespan (22-62 days, P,0, 0001; [5]). All lines were reared in glass vials with wheat-sugar-agar medium, at 25uC.

Nucleic acids isolation
DNA was extracted from 50 lines according to the standard procedures [54]. Total RNA for Northern and 59RACE analyses was isolated using the SV Total RNA Isolation System (Promega) according to the manufacturer's instructions. Total RNA for realtime quantitative PCR was extracted from 50 12-hour embryos and from 20 heads (10 males and 10 females) or 50 pairs of testes of 15-day old adult flies using Trizol reagent (Invitrogen) and DNase I Kit (TURBO DNA-free TM , Ambion) according to the manufacturers' instructions.

DNA sequencing and analysis
Isolated DNA was used in PCR reaction with forward primer TCC AAC CAG ACT GTC AAG TCA AAT TAC and reverse primer TTG CAG AAA GAG AAT AAC GCT AAA TCA. Then PCR products were sequenced with Big Dye Terminator V. 3
A 1640 bp Lim3 PCR fragment was amplified (forward primer: TTC AAT TAG CAT GAT CCA AGG, reverse primer: TCA CAT TTG CCA TTG GAC AGG AAG TC) and used as a probe to detect Lim3 transcripts. DNA probes (5-10610 6 cpm) added to the hybridization mixture were labeled by Hexa Label TM DNA Labeling Kit (Fermentas) with 20-40 mCi [a-32 P] and then purified with CentriSep columns (Princeton Separations).

Rapid amplification of 59cDNA end (59RACE) analysis
Transcription start sites of Lim3A mRNA in D. melanogaster were identified with the rapid amplification of the cDNA ends (RACE) technique using Smart TM RACE cDNA Amplification Kit (Clontech) for the first-strand cDNA synthesis. The touchdown PCR of the first-strand cDNA was then performed by using the gene-specific reverse primer, TCA CAT TTG CCA TTG GAC AGG AAG TC and the manufacturer's Abridged Anchor primer (Smart TM RACE cDNA Amplification Kit, Advantage 2 Polymerase Mix, Clontech). The annealing was performed at 64uC for 30 sec and extension at 68uC for 3 min, other parameters of the touchdown PCR were selected according to the manufacturer's recommendations. PCR products were gel-purified (WizardH PCR Preps DNA Purification System, Promega) and cloned into pGEM-T EasyVector (Promega). Plasmid DNA was isolated (WizardH Plus Minipreps DNA Purification Systems, Promega) and sequenced.

Real-time RT-PCR
The first strand of cDNA was synthesized using Super Script TM II Reverse Transcriptase (Invitrogen) with oligo(dT) primer, according to the manufacturer's instructions. cDNA amount was analyzed by real-time quantitative PCR using SYBR Green I/Rox in Chromo4 Real-Time PCR Detector (Bio-Rad). Equal amounts of mRNA and cDNA for real time RT-PCR analysis were used to evaluate the Lim3A expression in various tissues and life stages.
Gdh, a housekeeping gene located on the chromosome 3 which was common to all the substitution lines and characterized by relatively low expression level comparable with expression level of Lim3 was used as a reference gene to normalize for differences in total cDNA between samples.

Statistical analyses
The nucleotide diversity was analyzed as the pairwise distance between alleles (p) and the average number of segregating sites (h) using DnaSP 4.0 [55]. This software was also used to assess linkage disequilibrium (LD) between polymorphic sites, and selective neutrality of observed polymorphisms (D, D* and F*, D* and F* with outgroup, [28,29]).
Association between molecular polymorphisms and lifespan was assessed by two-way fixed effects ANOVA of line means, with polymorphic marker and sex as main effects, and by nonparametric distribution free Wilkoxon test of line means. Association between molecular polymorphisms and Lim3 transcription was assessed by nonparametric distribution free Wilkoxon test of mRNA amount or C(t)s [30]. REST V2.0.7 program [31], with the number of randomizations equal to 10,000, was used to verify the results. Multiple comparison of means (Tukey's test) was used to compare lifespan and Lim3 expression in groups of lines with different molecular haplotypes. Regression analysis with mean lifespan as a dependent variable and Lim3A mRNA amount as independent variable was used to assess association between lifespan and Lim3 transcription. Bonferroni and False Discovery Rate (FDR, [56]) corrections for multiple analyses were used when appropriate.