Figure 1.
Phylogenetic tree of the Drosophila genus.
The 4 species studied here (D.melanogaster, D.yakuba, D.pseudoobscura, D.virilis) are highlighted and illustrated with a picture of an adult as well a blastoderm embryo. Scales for adults (Nicolas Gompel, Flybase) and embryos are indicated. The three internal nodes of the (D.melanogaster, D.yakuba, D.pseudoobscura, D.virilis) tree are highlighted. Divergence times are indicated under the tree [20].
Figure 2.
Comparison of binding profiles of BCD, GT, HB and KR at the even-skipped locus in the four species D.melanogaster, D.yakuba, D.pseudoobscura and D.virilis.
An illustration of the two types of comparisons made in this study are highlighted in grey: trans-species comparison for each single TF (right) or trans-TFs comparisons (left). For simplicity, the species names were shortened using their initial: D.melanogaster (M), D.yakuba (Y), D.pseudoobscura (P) and D.virilis (V).
Figure 3.
Comparison of BCD, GT, HB and KR binding in D.melanogaster, D.yakuba, D.pseudoobscura and D.virilis.
A. Pair-wise comparisons of BCD, GT, HB or KR binding between D. melanogaster and D. pseudoobscura. Spearman's correlation coefficients are indicated. All correlations were highly significant (p-values<2.10−16). See Figure S4 for all pair-wise comparisons. B. Neighbor-joining trees based on pairwise distance matrices of TF occupancy at bound loci. C. Proportion of the number of species for which TF was detected per cluster, from a species-specific peak (“one”), to a peak conserved in all 4 species (“four”). For simplicity, the species names were shortened using their first three letters: D.melanogaster (mel), D.yakuba (yak), D.pseudoobscura (pse) and D.virilis (vir).
Figure 4.
TF-specific motif turnover drives TF binding divergence.
A. Comparison of quantitative variation of BCD binding divergence vs. underlying sequence divergence. Binding divergence was measured by the variance of BCD binding along the Drosophila tree according a Brownian motion model. Sequence divergence was measured as the total length of a PhyML phylogenetic tree based on the sequence alignment underlying bound regions. B. Sequences under bound regions are enriched for TF-specific motifs in all species. TF-specific enrichment was calculated for each of the 28 ChIPs. The plot summarizes motif enrichment in any of the 28 ChIPs distributed between 12, 4, 8 and 4 ChIPs in D. melanogaster, D. yakuba, D. pseudoobscura and D. virilis. C. Enrichment of BCD motifs in bound regions is quantitatively highly predictive of BCD binding. BCD binding prediction was solely based on underlying BCD motif content (TAATCC) [35] D. Comparison along the tree of branch-wise BCD binding divergence and predicted BCD binding divergence, E. Values of BCD binding divergence (same as D) were partitioned into three categories, depending on predicted changes of BCD binding along a branch: predicted increase, decrease or limited change in binding (thresholds indicated by vertical lines in D.). ***: Wilcoxon test p-value<0.001. Similar plots for GT, HB and KR can be found in Figures S7 and S8.
Figure 5.
Zelda divergence may drive TF binding divergence.
A. Zelda CAGGTAG motif is enriched in all ChIPs in all species. The enrichment variability between species is mostly due to differences in ChIPs qualities. B/D Principal Component Analysis (PCA) of binding of all factors. (B) PCA of the binding strength in the 4 species together and (D) PCA on the change in binding strength along each branch of the tree across all peaks. Each row represents a factor, and each column is a principal component of the relevant data. The color represents the sign (yellow positive, blue negative) and magnitude (color intensity) of each value in each principal component vector. In each case the sign of the first principal component is the same for all four factors, indicating that the dominant driver of both interspecies divergence and quantitative variation within single species is a coordinated change in binding strength of all factors. This effect explained 40% of the variation between species, and 58% of the variation within species. Species-specific PCAs are shown in Figure S12 C. Zelda occupancy (from [25]) and PC1 coordinates are highly correlated (Spearman correlation ∼0.79, p-value<10−16). E. Changes in PC1 along the branches of the tree correlate with changes in Zelda binding predicted solely by the enrichment of the Zelda motif. *** Wilcoxon test p-value<0.001. See Figure S13 for comparison of predicted Zld binding and Zld binding.
Figure 6.
BCD, GT, HB and KR binding events are differentially conserved, and binding predicted to be functional is better conserved.
A. Comparison of qualitative conservation of TF binding in the different species. A conservation score, corresponding to the average number of species in which binding was detected (1–4), was calculated for each set of orthologous regions and ranked according to ancestral mean, as estimated using a Brownian motion model. B. Comparison of binding intensities, as represented by ancestral mean binding, and trans-species binding variance in a Brownian motion model of TF binding evolution. C. Mean binding conservation score (1–4 species) depending on peak location in D. melanogaster. D. Mean binding conservation score depending on the number of other A-P factors binding the same locus. To correct as much as possible for TF binding differences linked to different wiring sizes, clusters were binned into 10 bins, depending on the estimated ancestral values, and the conservation was estimated independently in each bin. The average conservation is displayed. Similar plots were made using different threshold for calling sets of bound orthologous regions (Figures S10 and S11).
Figure 7.
mRNA levels are highly conserved despite high divergence of BCD, GT, HB and KR binding.
A. Pairwise comparison of mRNA levels in D.melanogaster, D.yakuba, D.pseudoobscura and D.virilis blastoderm embryos. B. Neighbor-joining tree based on pair-wise distance matrices of mRNA levels, based on Spearman's correlation coefficient. C. Phylogenetic variance of mRNA levels is significantly lower than variance of BCD, GT, HB of KR binding (Wilcoxon test p-value<10−16). In order to compare variance, quantitative values were normalized by dividing each dataset by its standard deviation, on which parameters of the Brownian motion model were reestimated. D. Proportion of bound regions associated with maternal, zygotic or maternal-zygotic genes, depending on the number of TFs binding the region. Regions bound by many TFs tend to be localized near zygotic genes whereas isolated peaks tend to be localized near maternal genes. E. mRNA levels of zygotic genes are well predicted by associated TF binding in all species. mRNA levels were predicted from a multiple linear regression of associated nearby TF binding. F. Changes along each branch of the tree of mRNA levels for zygotic genes are modestly but significantly correlated with predicted changes based on quantitative changes of associated TF binding.