Large-Scale Turnover of Functional Transcription Factor Binding Sites in Drosophila

The gain and loss of functional transcription factor binding sites has been proposed as a major source of evolutionary change in cis-regulatory DNA and gene expression. We have developed an evolutionary model to study binding-site turnover that uses multiple sequence alignments to assess the evolutionary constraint on individual binding sites, and to map gain and loss events along a phylogenetic tree. We apply this model to study the evolutionary dynamics of binding sites of the Drosophila melanogaster transcription factor Zeste, using genome-wide in vivo (ChIP–chip) binding data to identify functional Zeste binding sites, and the genome sequences of D. melanogaster, D. simulans, D. erecta, and D. yakuba to study their evolution. We estimate that more than 5% of functional Zeste binding sites in D. melanogaster were gained along the D. melanogaster lineage or lost along one of the other lineages. We find that Zeste-bound regions have a reduced rate of binding-site loss and an increased rate of binding-site gain relative to flanking sequences. Finally, we show that binding-site gains and losses are asymmetrically distributed with respect to D. melanogaster, consistent with lineage-specific acquisition and loss of Zeste-responsive regulatory elements.


Large-scale turnover of functional transcription factor binding sites in Drosophila
Alan M. Moses, Daniel A. Pollard, David A. Nix, Venky N. Iyer, Xiao-Yong Li, Mark D. Biggin and Michael B. Eisen PLoS Computational Biology (2008) Vol. 2, no. 10, p. e130 The importance of changes in gene regulation for organismic evolution-as opposed to changes in gene coding sequences per se-has long been recognized. However, sequences that control regulation tend to be less well characterized than coding sequences and hence not as intensively studied. The paper of Moses et al. presents a large-scale evaluation of the rate of gain and loss of Drosophila cis-regulatory transcription factor binding sites (TFBS). This work is notable for its combination of experimental and computational analyses along with the analytical rigor that is brought to bear on the question of TFBS evolution. The authors focused on the evolutionary turnover (gain and loss) of TFBS for a single transcription factor Zeste, but considered these dynamics on a whole genome scale. The genome scale comparative analysis was facilitated by the use of the high-throughput ChIPchip experimental technique to identify genomic sites bound by Zeste as well as the availability of a number of complete genome sequences for closely related Drosophila species. The authors identified 294 Drosophila melanogaster genomic regions that were bound by Zeste and had corresponding orthologous sequences in three other closely Drosophila species. Evaluation of these regions with a Zeste position weight matrix led to the identification of 1,406 putative Zeste binding sites. In order to detect evolutionary turnover of these binding sites, the authors developed a probabilistic model that represents the position-specific evolutionary rates of Zeste sites based on the factors known binding specificity. Observation of the position-specific variation observed when the putative Zeste TFBS they identified were compared across genomes showed that Zeste sites evolve more rapidly than predicted by the binding site model but less rapidly than predicted by a neutral model. In other words, their population of putative Zeste sites includes conserved sites as well as nonconserved sites that have been lost or gained over time. A likelihood ratio test, based on the Zeste TFBS evolution model, was then applied to test whether each Zeste site was more likely to be conserved or nonconserved. This test revealed 215 (>15%) putative Zeste binding sites to show statistically significant nonconserved signals. A number of additional statistical tests were used to validate this result and in particular, to control for errors in sequence alignment that could confound the analysis. The rapid turnover of TFBS revealed by their analysis holds up well to this scrutiny. They come up with a fairly conservative estimate of 5% turnover in Zeste sites over the last 10 million years. The focus of most evolutionarily informed studies on regulatory sequence has been placed on the sequences that are conserved between genomes. This manuscript, along with other recent studies, shows how rapidly TFBS sequences evolve and underscores the importance of regulatory sequences that are diverged rather than conserved over time. After all, it is these nonconserved TFBS sequences that are most likely to contribute to the regulatory changes between species that are so important in evolution.

Co-evolution of transcriptional and post-translational cell-cycle regulation
Lars Juhl Jensen, Thomas SkÖt Jensen, Ulrik de Lichtenberg, SÖren Brunak and Peer Bork Nature (2006) Vol. 443, no. 7111, pp. 594-597 This large-scale study on the evolution of molecular complexes that function in the cell cycle has profound implications for the understanding of how biological systems evolve. Apparently, there are very different rules involved in governing evolutionary dynamics at different levels of biological organization. Jensen et al. performed a comparative analysis of the regulatory versus coding sequence evolution for genes that have periodic expression along the stages of the cell cycle. They took advantage of numerous large-scale gene expression data sets to identify hundreds of periodically expressed genes among four divergent eukaryotes: human, two yeasts Saccharomyces. cerevisiae and S. pombe and the plant Arabidopsis thaliana. At the same time, they also identified gene sets that contain orthologs of periodically expressed genes for the four organisms. The majority of orthologous gene sets that contain at least one periodically expressed member do not have orthologs in all four species. In addition, for those orthologous sets with at least one periodically expressed member and orthologs in all four species, expression periodicity was not conserved. Only five orthologous sets, out of 381 with at least one periodically expressed member, show periodic expression in all four species. In other words, even when sequences are demonstrably conserved between lineages, gene expression patterns are not. Less surprisingly, the sets with conserved expression and sequences correspond to essential regulators of the cell cycle or genes such as histones that encode the building blocks of eukaryotic chromatin. However, even for gene sets where the periodic expression is conserved, the actual timing of this expression, i.e. in which phase of the cell cycle the genes are expressed, can often differ. The authors also reveal that many of the changes in periodic expression of cell cycle genes are accompanied by changes in post-translational regulation, phosphorylation in particular, of the encoded proteins. This post-translational regulation adds another layer of control for cell cycle complexes and helps to ensure that the appropriate complexes are assembled 'just in time' to play their functional role. Often these functional complexes are assembled from a number of proteins encoded by constitutively expressed genes along with one or a few members that are periodically expressed and/or post-transcriptionally modified. The genes with periodic expression and post-translational modifications may act as triggers to ensure the proper timing and function of the complexes, and evolutionary conservation of the specific identity of the triggers is not as important as their existence. Furthermore, they show evidence that different levels of regulation co-evolved, since periodically expressed genes are more likely to be post-translationally regulated. The authors conclude that many different solutions to the problem of assembling the same molecular complexes at the appropriate point along the cell cycle have evolved over time. This indicates that while the protein-coding component of these molecular complexes is highly conserved, the regulatory part is far more dynamic. Thus, regulatory innovations may be a much more viable substrate for adaptive divergence than changes in gene coding sequence and protein structure.

Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus
Yuri I. Wolf, Cecile Viboud, Edward C. Holmes, Eugene V. Koonin and DavidJ. Lipman Biology Direct (2006) Vol. 1, p. 34 Yuri Wolf and colleagues have published the rare molecular evolution study that has both fundamental implications for the understanding of how natural selection is manifest at the level of gene sequences as well as urgent human health significance. Both of these accomplishments are in no small way related to their choice of study-the human influenza virus and its hemagglutinin (HA) encoding gene. Influenza epidemics lead to thousands of deaths every year, and the HA gene is thought to evolve rapidly via positive Darwinian selection that is predicated upon changing antigenicity of the HA epitope regions. However, it is the combination of an impressively large-scale analysis along with the judicious distinction of rapid versus static evolutionary regimes that allowed for the heterodox conclusions of this work. In short, the authors find that HA evolution is characterized by long periods of stasis, which show basically no evidence of positive selection, followed by short periods of rapid evolution under strong positive selection. This finding stands in stark contrast to the prevailing views on influenza virus evolution, which emphasize that the rapid evolution of HA is driven by continuous positive selection. It seems that the particular data set analyzed here may also have much to do with the novel conclusions reached. The authors took advantage of the existence of more than 1,000 publicly available influenza genomes that were sequenced as part of the Influenza Genome Sequencing Project (IGSP) between 1995-2005. Unlike many other influenza sequences that have been characterized because of their seriological novelty, the IGSP sequence data analyzed here are unbiased in the sense that they are not enriched for antigenically novel isolates. For the H3N2 influenza subtype, periods of stasis are marked by a higher rate of synonymous than non-synonymous substitutions, no demonstrable overrepresentation of HA amino acid changes in the epitope regions of the protein and slow extinction of co-circulating viruses from different lineages. On the other hand, short and rapid evolutionary phases are dominated by positive selection as demonstrated by an excess of amino acid changes in the HA epitope regions relative to the rest of the protein, and a relatively lower ratio of synonyomous to nonsynonymous substitution rate. In the rapid evolutionary phase, newly arisen dominant lineages quickly lead to the extinction of the previously coexisting lineages. There were also a number of cases of parallel amino acid substitutions in different lineages shown to have occurred during the rapid evolutionary phase. Independent parallel substitutions across lineages are also consistent with positive selection and may be important for vaccine development since these could be used as predictors of new dominant strains. It should be noted that this clear case of punctuated evolution is complicated by the H1N1 strains, which show no evidence of positive selection yet manage somehow to out compete H3N2 lineages in some seasons. In this case, the H3N2 lineages may be so successful as to limit the availability of susceptible hosts thus opening up a new niche for H1N1.

Accelerated evolution of conserved non-coding sequences in humans
Shyam Prabhakar, James P. Noonan, Svante Pa« a« bo, and Edward M. Rubin Science (2006) Vol. 314, no. 5800, pp. 786 Only a tiny fraction of the human genome actually codes for protein sequences, and the functional relevance of the remaining noncoding sequence is an open question that has attracted quite a bit of attention as of late. Investigators have been taking advantage of comparative sequence data to hone in on the anomalously conserved regions of noncoding genomic sequence under the rationale that such regions are most likely to be functionally germane. However, from an evolutionary perspective such sequences may actually be the least interesting of all because they do not change (or change very little) between species. Thus, focusing on conserved sequences, such as when looking for transcription factor binding sites, may actually be positively misleading in terms of identifying evolutionarily important sites. Fortunately, after a few years of almost exclusive focus on conserved noncoding regions, several groups have begun to look for regions that show accelerated regions and this short paper by Prabhakar et al. exemplifies this approach. Interestingly, in looking for regions that change rapidly between species the authors have used regions previously identified as highly conserved as a starting point. Specifically, they evaluated more than 100,000 conserved noncoding sequences for evidence of human-specific substitutions using a statistical null model based on a pattern of constrained evolution. This approach yielded 992 distinct genomic sequences showing statistically significant evidence for an excess of human-specific substitutions. The authors went on to associate these regions with human genes and used their Gene Ontology (GO) annotations to check for enrichment of specific functional categories. The human gene set associated with accelerated regions was found to be enriched for the biological process GO term cell adhesion, and many of these genes were specifically involved in neuronal cell adhesion, e.g. cadherins and neuroligins. This finding has obvious implications for the evolution of uniquely human cognitive traits. As a control, the authors also looked for accelerated regions using the same approach in mouse and chimpanzee. Interestingly, while the accelerated sequences identified in chimpanzees were distinct from those identified in human, they too showed a significant enrichment near genes involved in neuronal adhesion. However, because the accelerated regions specific to human and chimpanzee neuronal adhesion genes are quite different, the authors argue that the phenotypic effects are likely to be quite different between the two lineages. In any case, it is intriguing that independent changes in the nervous system may have been particularly important in the evolutionary diversification of both species.

Forces Shaping the Fastest Evolving Regions in the Human Genome
Katherine S. Pollard, Sofie R. Salama, Bryan King, Andrew D. Kern,Tim Dreszer, Sol Katzman, Adam Siepel, Jakob S. Pedersen, Gill Bejerano, Robert Baertsch, Kate R. Rosenbloom, Jim Kent and David Haussler PLoS Genetics (2006) Vol. 2, no. 10, p. e168 Comparative genomic sequence analyses, in particular those involving the recently completed chimpanzee sequence, allow for the identification of the evolutionary sequence changes that may have led to uniquely human characteristics. This group led by David Haussler was among the first to identify so-called 'ultra-conserved' elements in the human genome. These are human genome sequences that were inferred by comparison with other genomes to have changed very little over evolutionary time. Such sequences are of interest because they are likely to have been conserved by natural selection because they are functionally important. Now this idea has been turned on its head and the emphasis is placed on genomic regions that are changing more rapidly than others. These accelerated regions can be inferred to be of outstanding evolutionary interest since they may lead to differences between lineages-in particular changes along the human lineage that may distinguish us from other closely related primate species. However, to get a handle on such regions in the human genome this team still relies on the presence of ultra-conserved regions. Specifically, what they do is identify genomic regions that are highly conserved in other species (chimp, mouse and rat) yet highly divergent in human. Using a statistical model to detect significant accelerations along the human evolutionary lineage, they were able to detect 202 genomic sequences that are highly conserved among other vertebrates yet divergent in humans. As with other highly conserved regions, these sequences are mostly noncoding. Analysis of the functional annotations of nearby genes revealed that many of these regions are close to genes involved in transcription and DNA binding. This underscores the importance of changes in the regulation of genes that are themselves in turn responsible for the regulation of other genes. In other words, a few pointed and specific sequence changes could lead to complex cascades of regulatory effects. The general assumption for these types of studies is that natural selection is a major player in determining rates of evolution. One of the most interesting aspects of this study was that the authors played close attention to sequence context and the actual kinds of substitutions that occur in accelerated regions. For instance, they show a preponderance of changes that result in the creation of guanine and cytosine as well as an overrepresentation of accelerated sequences in regions of high recombination. This suggests the possibility that other forces, such as biased gene conversion, may also play a role in accelerations along the human evolutionary lineage.
Large punctuational contribution of speciation to evolutionary divergence at the molecular level Mark Pagel, ChrisVenditti and Andrew Meade Science (2006) Vol. 314, no. 5796, pp. 119-121 The foundations of modern evolutionary theorythe so-called 'New Synthesis' or 'Neo-Darwinian Synthesis'-were laid down in the middle of the 20th century when Mendelian genetics were united with Darwinian theory. This elaborate body of formal theory detailed the how and why allele frequencies could change within and between populations of a given species. It was assumed that an understanding such microevolutionary-i.e. within specieschanges were both necessary and sufficient to explain longer term macroevolutionary changes between species. This model, which quickly became orthodoxy-and to some even dogma-strongly implied that evolution was a long process of steady and gradual change. The theory of punctuated equilibrium, which holds that macroevolution is better characterized by long periods of stasis followed by short bursts of rapid change that accompany speciation, is a fundamental challenge to Neo-Darwinian orthodoxy. Pagel et al. set out to assess the relative contributions of gradual versus punctuated change to genomic sequence changes by using a largescale phylogenetic approach. The strength of their approach is the use of two simple and mutually exclusive phylogenetic models that discriminate between gradual and punctual modes of evolution. The model rests on an evaluation of path lengths, which are the sums of the branch lengths from the roots of phylogenetic trees to the tips of the trees. Since gene sequence alignments were used to build the phylogenetic trees under consideration here, the path lengths are expressed as the number of nucleotide substitutions per site. According to the particular branching pattern of the phylogeny under consideration, there will be more or fewer nodes along the different paths from the roots to the tips. Since the nodes represent speciation events, the punctuated model predicts that paths that cross more nodes should be longer than paths that traverse fewer nodes. The gradual model, on the other hand, predicts no such correlation between node number and path length.122 gene sequence alignments were analyzed according to a statistical model that evaluates the punctuated contribution to the observed path lengths. The authors found that 47% of these data sets were consistent with the punctuated model; after a correction for artifacts this figure fell to 35%. In addition, punctuated evolution appears to be much more common in plants and fungi than in animals.
Given their finding that punctuated evolution makes a substantial contribution to sequence changes, the authors next tried to estimate the extent of this contribution to overall sequence diversity. Just over one-fifth (22%) of DNA sequence substitutions were attributed to punctuated evolution. The take home message of this work is that punctuated change has been far more important for sequence variation than previously thought and thus should be accounted for in models of molecular evolution. On a technical note, such punctuated change will lead to violations of clock-like molecular evolution and so should be accounted for when using molecular data to estimate divergence times between evolutionary lineages. It should be noted that while the authors observe a strong signal for punctuated molecular evolution, they do not see any evidence for long periods of stasis, the other part of the punctuated equilibrium picture.