Figure 1.
Animal genomes and sequenced animal genomes vary greatly in size.
Genome size ranges for selected animal phyla (and other major taxonomic grouping) are shown as grey bars. Genome size data is from the Animal Genome Size Database [1]. Circles show sizes of genomes whose sequences have been published (red circles) or in progress (black circles). In progress genomes were obtained from National Human Genome Research Institute and the Department of Energy's Joint Genome Institute.
Figure 2.
Landscape of sequence conservation in vertebrates and Drosophila.
Posterior probabilities of selective constraint are plotted across illustrative loci in Drosophila and vertebrates (computed with PHASTCONS [44]; data obtained from UCSC genome browser). Blue annotations indicate coding regions, green indicate experimentally validated enhancers. A) Genomic interval surrounding the D. melanogaster even-skipped gene (conservation shown is for 12 Drosophila species plus Anopheles, Apis, and Tribolium). Several confirmed eve enhancers are shown, drawn from the RedFly database [28], [47]. B) Approximately 150 kb of the human SALL1 locus (conservation shown is across all vertebrates). The midbrain and neural tube enhancer depicted here is from [48].
Figure 3.
Tephritid genomes are larger than Drosophila genomes.
Phylogenetic relationships and approximate divergence times of several dipteran species (left) are shown along with experimentally determined haploid genome sizes (right), drawn from the Animal Genome Size Database [1] (Drosophila spp, M. domestica, A. aegypti, A. gambiae), and our own experiments (Bactrocera spp, C. capitata, R. juglandis). While some groups (Drosophila, Anopheles) have undergone substantial reduction in genome size, many closely related species including the tephritids described here have substantially larger genomes. Asterisks indicate species with available whole-genome sequence.
Table 1.
Loci sequenced for this study.
Table 2.
Sizes of even-skipped locus (MAS enhancer to stripe 4/6 enhancer).
Table 3.
Conserved non-coding sequenced in C. capitata even-skipped locus
Figure 4.
Landscape of sequence conservation in tephritids and Drosophila (eve).
A) Phastcons [44] (version v0.9.9.6b) estimated posterior probabilities of conservation in four tephritids for 60 kb surrounding the C. capitata eve gene. Blue annotations indicate coding regions, conserved intervals are shown in orange. The interval numbers are used throughout the text. The presumptive C. capitata basal promoter is shown in light blue. B) D. melanogaster eve locus conservation plot computed with phastCons (rho 0.25) [44], rendered to scale with C. capitata plot in panel A, showing comparable highly conserved content but with virtually all intervening non-conserved DNA absent in Drosophila. Redfly enhancers listed in Figure 2 are shown in green and the basal promoter in light blue.
Figure 5.
Size and spacing of highly conserved regions of human, Drosophila and tephritid genomes demonstrate global differences in constraint landscapes.
Cumulative sums of normalized histograms are displayed for the sizes of conserved blocks (panel A) and the distances between them, i.e. the sizes of non-conserved intervals (panel B). Distributions of conserved region sizes are similar for Drosophila, C. capitata and human. Spacing between conserved regions, however, shows very different distributions in Drosophila and human; C. capitata conserved element spacings are similar to those observed in the human genome. Distributions are shown for UCSC phastCons “most conserved” tracks for human (black diamonds) and D. melanogaster (blue diamonds) as well as for phastCons run in-house on tephritid alignments (red line). In addition, D. virilis conserved block sizes and spacing (cyan line in panels A and B) are shown in order to assess the utility of a species with a large genome in supplying inter-element spacing information akin to vertebrates and tephritids (see text). In-house alignments and phastCons data are similarly displayed for D. melanogaster referenced Drosophila alignments (blue line) and for human referenced vertebrate alignments in 1% of the human genome (black line) in order to establish consistency between our analyses and UCSC datasets.
Figure 6.
Native expression pattern of eve in Drosophila melanogaster and Ceratitis capitata.
even-skipped expression patterns in D. melanogaster (A–H) and C. capitata (I–P) embryos were visualized by in situ hybridization with species-specific digoxigenin-labeled antisense RNA probes. While clear differences are manifest in the extremely early phases of expression (D. melanogaster stage 4–5, fixed 2–4 h AEL panels A,B; C. capitata fixed 0–8 h AEL panels I, J), Previously characterized epochs of eve expression appear substantially conserved. Parasegmental expression is conserved in the blastoderm and gastrulating embryo (D. melanogaster fixed 0–4 h AEL panels C, D and E, respectively, C. capitata fixed 8–32 h AEL panels K, L and M, respectively). So too is the post-gastrula expression domain of eve in the posterior, and in mesodermal lineages of the germ band extended embryo (D. melanogaster fixed 0–18 h AEL panels F, G, C. capitata fixed 8–32 h AEL panels N, O) and the neuronal and anal plate ring expression domains in the late embryo (D. melanogaster fixed 0–18 h AEL panel H, C. capitata fixed 26–50 h AEL panel P).
Figure 7.
Expression patterns driven by tested eve fragments.
Expression of reporter transcript in transgenic D. melanogaster embryos expressing either CFP or lacZ under the control of C. capitata conserved fragments and the naïve D. melanogaster eve basal promoter were visualized by in situ hybridization with digoxigenin-labeled antisense RNA probes. We tested all 9 fragments labeled in Figure 4. A–C) CFP expression driven by conserved fragment 1 (see Figure 4A) in blastoderm, gastrulating and germ-band extended embryos is entirely consistent with that of the D. melanogaster Minimal Autoregulatory Sequence (MAS; see Figure 2 in [24]). D) Conserved fragment 2 drives LacZ expression in the domain of the eve third parasegmental stripe, reminiscent of the activity of the D. melanogaster stripe 3+7 element (see Figure 2 in [22]), although the seventh stripe is not observed. E) CFP driven by conserved fragment 3 recapitulates the expression of the second stripe, along with weaker and incompletely penetrant expression in the domain of the seventh stripe, consistent with that driven by the D. melanogaster stripe 2 element (MSE; see Figure 2 in [21]). F,G) Conserved fragment 6 drives lacZ expression in the early anal plate ring as observed in the D. melanogaster eAPR enhancer (H–K). Segmental neuronal (H,J) and late anal plate ring (APR, I,K) CFP expression is observed in fragments 7 (H,I) and 8 (J,K). Fragment 7 neuronal expression (H) appears after germ-band retraction, and is primarily localized to EL neurons, while fragment 8 neuronal expression (J) appears earlier, and in both EL and CQ neurons. These activities are consistent with D. melanogaster EL neuronal and CQ neuronal/late APR enhancers (see Figure 3 in [23]). Fragment 4 drives fat body expression (data not shown); eve is not expressed in the fat body in D. melanogaster or C. capitata. Interestingly, the ftz-like element in D. melanogaster is also located in this region between the end of the coding sequence and the next annotated enhancer. The ftz-like element also drives expression that does not overlap with native eve expression. It should be noted that the fat body from C. capitata does not map to the ftz element. Fragments 5 and 9 drove no expression. Fragment 9 maps to the proximal half of the stripe 4+6 enhancer. We were missing comparative data beyond this fragment so it is possible that this conserved region extends distally and that we cloned an incomplete enhancer.
Figure 8.
Mapping tephritid eve enhancers to D. melanogaster.
A) Aggregate scoring of short, non-significant BLAST HSP and unique K-mer matches between the C. capitata (top) and D. melanogaster eve loci (bottom) was employed as described above to generate an orthology mapping of non-coding regions flanking the eve gene. Dark grey bars with opacity proportional to relative confidence of mapping link the best match regions between families. Orange annotations in C. capitata, top, indicate the cloned conserved fragments (numbering as employed throughout this work). Green annotations, in D. melanogaster, bottom, are confirmed enhancers drawn from the RedFly database [28], [47] (MAS: eve_mas; st3+7: eve_stripe_3+7; st2: eve_stripe2; ftz: eve_ftz-like; eAPR: eve_early_APR; CQ: eve_CQ/late_APR; 4+6: eve_stripe_4+6; MHE: eve_MHE; st1: eve_stripe1; st5: eve_stripe5). B) Zoom in on D. melanogaster locus showing mapped tephritids CNSs (grey, shading reflects mapping score) and known D. melanogaster enhancers [28] (green).