Advertisement
  • Loading metrics

Simulation of Molecular Data under Diverse Evolutionary Scenarios

  • Miguel Arenas

    miguel.arenasbusto@iee.unibe.ch

    Affiliations Computational and Molecular Population Genetics Lab (CMPG), Institute of Ecology and Evolution, University of Bern, Bern, Switzerland, Swiss Institute of Bioinformatics, Lausanne, Switzerland

Simulation of Molecular Data under Diverse Evolutionary Scenarios

  • Miguel Arenas
PLOS
x

This is an original PLoS Computational Biology tutorial.

Introduction

This study is intended for evolutionary biologists interested in strategies for the simulation of molecular data under diverse evolutionary scenarios. It begins with a brief background on simulation approaches and describes some of the most important simulators developed to date. Then, several practical examples for simulating particular scenarios are presented, and finally, details are given on a variety of relevant applications of simulated data. Overall, this study provides a practical guide for applying simulation techniques to real world problems in molecular evolution.

The Importance of Computer Simulations in Molecular Evolution

A commonly used methodology to mimic the processes that occur in the real world is to perform computer simulations [1]. Computer simulations allow us to understand which patterns may dramatically alter a particular system and can be used to study complex processes, including those that are analytically intractable. Furthermore, the simulation of multiple replicates with stochasticity may provide the variability required to study numerous processes, such as those often found in evolution. In molecular evolution, the simulation of genetic data has been commonly used for hypothesis testing (e.g., [2]), to compare and verify analytical methods or tools (e.g., [3][5]), to analyze interactions among evolutionary processes (e.g., [6]), and even to estimate evolutionary parameters (e.g., [7]). Consequently, a wide variety of tools have been developed to simulate sequence data under different substitution models of evolution, but also under different evolutionary processes such as selection, recombination, demographics, population structure, and migration. In recent years, new programs have been developed to handle very complex scenarios (e.g., [8], [9]) and efficient algorithms have been incorporated in order to accommodate large datasets in response to the increasing amount of genome-wide data (e.g., [10]). Thus, the importance of simulations continues to grow in order to deal with these new challenges.

Approaches for the Simulation of Molecular Data

After the simulation of evolutionary histories (see Box 1), or when just a rooted tree or network is given, a sequence assigned to the most recent common ancestor (MRCA, or grand MRCA [GMRCA] in the case of networks) can be evolved along branches according to a substitution model of evolution, in order to simulate sequences for all internal and terminal nodes (see an example in Figure 1). A common procedure consists of applying continuous-time Markov models defined by 4×4, 20×20, and 61×61 matrices of substitution rates for nucleotide, amino acid, and codon (note that stop codons are ignored) data, respectively (details in [11]). This methodology is very flexible and allows for heterogeneous evolution where different sites and branches can be evolved under different substitution models (e.g., [12]). These aspects suggest in practice two important considerations. Firstly, simulations of nucleotide sequences are much faster than simulations of coding or amino acid sequences due to the dimension of the substitution matrices. Secondly, a large number of branches (derived from a large number of taxa or recombination events) leads to slower simulations due to the need to re-calculate the matrix for each branch.

thumbnail
Figure 1. Example of nucleotide evolution on the ancestral recombination graph.

Note that this ARG contains a recombination event with breakpoint at position 6. Starting from a sequence assigned to the GMRCA, substitutions (marked with black circles) occur forward in time. Non-ancestral material (material that does not reach the sample) and its substitution events are shown in grey.

https://doi.org/10.1371/journal.pcbi.1002495.g001

Box 1. Simulation of Evolutionary Histories

There are two main approaches commonly used to simulate evolutionary histories in population genetics: the forward in time (forward-time) and the coalescent (backward-time). Here I describe the main particularities of these approaches, considering goals and limitations for the simulation of diverse evolutionary scenarios.

The forward-time approach simulates the evolutionary history of an entire population from the past to the present and allows the success of a lineage to be a function of the genotype (see reviews, [13], [14], [80]). Thus, these simulations consider all ancestral information and therefore can be useful to fully study the subsequent evolutionary process of the population, including gene–gene interactions, mating systems, complex migration models (such as sex biased dispersal or long-distance dispersal), or complex selection (e.g., [42], [81], [82]); beginners may explore these basic concepts using educational simulations [83], [84]. Unfortunately, because the whole population history is simulated, forward simulations require generally extensive computational cost, although recently significant improvements have been achieved in this concern (e.g., [85]).

On the other hand, the coalescent approach describes a backwards in time genealogical process of a sample of genes to a single ancestral copy (see reviews [86], [87]). The coalescent allows the simulation of a limited set of scenarios, namely population size changes (e.g., [88]), population structure and migration (e.g., [89]), recombination (e.g., [90]), and selection (e.g., [91]). A key aspect of the coalescent is that the history of the whole population is not required (so it is not actually simulated) and, consequently, it is generally computationally faster than the forward-time approach. It is important to remember, however, that the efficiency of forward-time simulations is irrespective of the amount of recombination or selection, in contrast to coalescent simulations that are highly sensitive to such processes.

Coalescent and forward-time approaches can be considered complementary [13]. In fact, recently two new methods have incorporated both approaches for fast simulations of complex scenarios [9], [33]. In conclusion, one should keep in mind that the choice of the simulation approach may depend on the complexity of the target scenario, as well as on the required computational cost for the simulation.

Main Software Implementations

A number of programs have been developed to simulate nucleotide, codon, and amino acid sequences evolution. Although several studies have already reviewed these software tools (e.g., [13][17]), such revisions quickly become obsolete due to the emergence of new simulators, as noted in [14]. Table 1 shows an updated list of user-friendly and commonly used programs available to date. Next, the most interesting software from a practical perspective is briefly described.

thumbnail
Table 1. The main software used to simulate genetic sequences under nucleotide, codon, and amino acid substitution models.

https://doi.org/10.1371/journal.pcbi.1002495.t001

When attempting to simulate a complex evolutionary scenario, several programs developed under the forward-time approach may be useful (see Table 1). GenomePop [18] and SFS_CODE [19] seem the most comprehensive tools with implementations of population structure, demographic particularities, recombination, and selection, but they do not allow simulations under amino acid substitution models. The programs SPLATCHE2 [9] and AQUASPLATCHE [20] are able to simulate nucleotide data under spatially (using land or freshwater maps, respectively) and temporally explicit demographic models. A disadvantage of these programs is that only two DNA substitution models are available, note that other programs such as SFS_CODE or SimuPop [21] implement all DNA substitution models (see Table 1), which may be problematic when trying to mimic genome-wide data (see [22]).

If our target scenario can be represented by the coalescent, a variety of coalescent-based programs are able to simulate nucleotide data (see Table 1). Nevertheless, only CodonRecSim [23], Recodon [24], and NetRecodon [8] can simulate coding sequences in the presence of recombination. The first two of these programs force recombination breakpoints to occur between codons while NetRecodon does not (see [8]). On the other hand, fastsimcoal [10], Recodon, and NetRecodon allow simulations with sampling at different times, which can be very interesting for the joint analysis of ancient and modern DNA [25].

When a phylogenetic history (one or several trees) is given, numerous programs exist to directly simulate sequences along such history (see Table 1, phylogenetic class). One of the most applied programs is Seq-Gen [26], which implements several nucleotide and amino acid substitution models. The program indel-Seq-Gen 2.0 [27] extended Seq-Gen to include diverse indel (insertion and deletion) models. Almost at the same time as Seq-Gen, the program EVOLVER (from the PAML package [28]) was released, which additionally allowed the simulation of coding data. Recently, INDELible [12] and PhyloSim [29] implemented all those capabilities, and in addition they included codon models where dN/dS (nonsynonymous/synonymous rate ratio) may vary across sites and/or branches. INDELible is very user-friendly but PhyloSim was implemented in R (language for statistical computing, [30]) and requires some programming knowledge.

Practical Examples

In this section I outline five hypothetical practical examples, of the fast simulation of genetic sequences under particular evolutionary scenarios, which will be of general interest. The reader may notice that some scenarios can be solved using more than one approach, but I base my suggestions here on how appropriate, flexible, and user-friendly I think the simulators are.

I) Nucleotide Data under Natural Selection

This scenario is commonly applied to identifying targets of positive selection in real datasets (e.g., [31], [32]). To my knowledge, there is no coalescent framework available to simulate data under natural selection whilst using Markov DNA substitution models, which may bring realistic information because not necessarily every mutation occurs at a different site in the sequence. However, two programs can be combined to quickly perform this task. First, we can simulate coalescent trees using the programs msms [33] or SelSim [34], although both tools are limited to simulation of a single locus under selection. Then, nucleotide sequences can be evolved along those trees using Seq-Gen. Another possibility is to apply a forward-time simulator that implements complex selection and all DNA substitution models (e.g., SFS_CODE).

II) Coding Data with Intracodon Recombination

Simulations with recombination breakpoints that occur within codons are more realistic since these particular events occur 2/3 of the time that a recombination happens, assuming a spatially uniform distribution. Therefore, these events might exert undue influence on other parameter estimates since current analytical phylogenetic methods using codon models and recombination assume intercodon recombination. However, such effects have not been observed; in particular, dN/dS estimations were not altered (see [8]), so this should be studied further. The fastest procedure for the simulation of intracodon recombination is to directly apply the program NetRecodon. Alternatively, GenomePop can also perform this simulation under the forward approach. This scenario was applied in [35].

III) Amino Acid Data with Indels and Under Recombination

This is a very specific scenario, but one that can also be very interesting for readers due to its complexity and the multiple possible options for its simulation. For instance, this scenario could be useful for testing phylogenetic tree reconstruction (or recombination detection) methods from amino acid datasets that evolved under recombination (e.g., [36]). As far as I know, there is no single tool available that can simulate this scenario. My suggestion is to first simulate coalescent trees (a tree for each recombinant fragment) by the program ms, and then amino acid sequences with indels can be evolved on the respective trees using INDELible.

IV) Long Genomic DNA Regions under Recombination

The amount of genomic data available increases rapidly and as a consequence, plenty of genetic studies focusing on large genomic regions have appeared (e.g., [37]). As expected, such studies require robust and memory-efficient simulators [10], [38]. One of them is fastsimcoal, which allows for efficient simulations because it is based on a simplification of the standard coalescent with recombination (the sequential Markovian coalescent [SMC] algorithm [39]). Therefore, it seems to be an appropriate framework to simulate this scenario.

V) Coding Data under a Spatial and Temporal Range Expansion

Spatial and temporal range expansions have occurred repeatedly in the history of most species and promote genetic consequences that are different than those produced by pure demographic expansions [40]. In addition, other spatiotemporal processes, such as range contractions and range shifts (usually produced during climate changes) or long-distance dispersal events, can also affect molecular diversity [41], [42]. Using SPLATCHE2, trees can be simulated under spatial and temporal range expansion in a straightforward manner. Then, coding data can be simulated over those trees by INDELible.

Applications of Simulated Genetic Data

Computer simulation is a powerful tool in population genetics with a rich variety of applications. Here I show some interesting published applications.

I. Hypothesis Testing

  1. The effect of recombination on ancestral sequence reconstruction.
    Recently, Arenas and Posada [35] tested if recombination can affect ancestral sequence reconstruction (ASR). They simulated nucleotide, codon, and amino acid data with NetRecodon and they observed that recombination biases the reconstruction of ancestral sequences, regardless of the method or software used. This effect was shown as a consequence of incorrect phylogenetic tree reconstructions when recombination is ignored [43]. Note that this effect is crucial for numerous ASR-based studies (e.g., [44]).
  2. The effect of recombination on selection tests.
    Tests for identifying selection (based on dN/dS) are frequently used in different species, including highly recombining viruses and bacteria (e.g., [45]). There is, however, an important pitfall of such tests in the presence of recombination. In the studies [8], [23] authors simulated coding data under several heterogeneous codon models [46] and different levels of recombination. Then, they applied likelihood ratio tests (LRTs) for model choice. Results showed a weak impact of recombination on the estimation of global dN/dS but a strong effect at the local level by inflating the number of positively selected sites. Simulations were carried out using CodonRecSim and NetRecodon.
  3. Testing criteria for substitution model selection.
    A common step in phylogenetics consists of the statistical selection of a DNA substitution model that best fits the data [47], [48]. Currently, this model selection can be performed using several criteria, namely hierarchical and dynamic LRTs, Akaike and Bayesian information criterion (AIC and BIC, respectively), and the decision-theoretic approach (DT). Although AIC and BIC showed advantages over LRTs [47], the best criterion among all other options remained unclear. Recently, Luo et al. [49] addressed this point by extensive simulations of nucleotide data (using PAML [28] to simulate four tree topologies and Seq-Gen to evolve DNA sequences under a wide set of substitution models) and coding data (using Recodon). Then, by statistical analysis they concluded that BIC and DT approaches favor accurate model selection.

II. Verification of Analytical Methods

  1. Validation of a method for large phylogenetic tree reconstruction.
    One of the most well-established programs for phylogenetic tree reconstruction is PHYML [50]. As with most analytical tools, PHYML required thorough validation through computer simulations. In particular, 5,000 random phylogenies were simulated according to the standard speciation process (see [51]), and then DNA sequences were evolved on those phylogenies using Seq-Gen. The program showed a topological accuracy similar to that from other maximum likelihood programs, but it strongly reduced computing time.
  2. Validation of a method for the detection of recombinant breakpoints.
    Recombination detection methods are fundamental for the analysis of genome dynamics, genetic mapping, and phylogenetic methods. As a result, a variety of methods for recombination detection exist (see [52]). One of them was recently developed by Westesson and Holmes [5] for the analysis of whole-genome alignments. For its validation, ancestral recombination graphs (ARGs) were simulated using Recodon, then marginal trees with identical topologies were excluded and DNA sequences were simulated on the remaining trees using Seq-Gen. The method accurately detected recombinant breakpoints even for genome-size datasets.

III. Study of Complex Evolutionary Processes

  1. Principal component analysis of human genetic diversity across Europe.
    A controversial topic that sparked debate in recent years was the interpretation of gradients of population genetic variation across Europe derived from principal component analysis (PCA) [53][56]. Briefly, while initially Cavalli-Sforza et al. [56] interpreted principal component (PC) gradients only as a consequence of human ancestral expansions, Novembre and Stephens [53] showed that similar PC gradients may arise from diverse spatial genetic patterns under equilibrium isolation-by-distance models. Recently, François et al. [55] carried out simulations of DNA data using SPLATCHE2 in order to mimic the Neolithic farmer expansion across Europe taking into account various levels of interbreeding between farmer and resident hunter-gatherer populations (see Figure 2). They concluded that demographic and spatial population expansions often lead to PC gradients that are perpendicular to the direction of the expansion as a consequence of the allele surfing phenomenon [57].
thumbnail
Figure 2. Example of a simulated modern human range expansion over Europe.

(A) Snapshots of the program SPLATCHE2 for an example of simulation of a Neolithic farmer expansion over Europe. Settings (demographic parameter values) used for this example are similar to those used in François et al. [55]. Note that the range expansion starts from the bottom-right corner of Europe. Snapshots are taken every 40 generations. White demes are empty and dark colors indicate low population densities (in particular at the front of the expansion). (B) Scheme of sampling locations used for this simulation. (C) Spatial distribution of coalescent events during the range expansion.

https://doi.org/10.1371/journal.pcbi.1002495.g002

IV. Estimation of Evolutionary Parameters

  1. Coestimation of evolutionary parameters using approximate Bayesian computation.
    Approximate Bayesian computation (ABC) is a recent and useful approach for the inference in evolutionary genetics (see [58]), based on computer simulations. It provides a robust alternative for those analyses where the likelihood function cannot be evaluated or is computationally too expensive. An interesting example studied by Wilson et al. [59] applied ABC to coestimate several evolutionary parameters (such as mutation, dN/dS, and recombination rates) from coding data of the bacteria Campylobacter jejuni. Although the simulator used was not published, such a scenario could be simulated using e.g., Recodon. In addition, Laval et al. [60] also applied an ABC-based approach to coestimate, assuming a particular model of human evolution, important historical and demographic parameters like the onset of the African expansions and the out-of-Africa migration, as well as the current and ancestral effective population sizes of Africans and non-Africans. Here the simulation of DNA data was performed using SIMCOAL2.

The Future of Computer Simulations

Although current software available can simulate a wide set of evolutionary scenarios, some limitations still remain concerning computational costs and particular complex models. In some cases the computational time is crucial (e.g., ABC studies that require millions of simulations to cover a wide range of parameter space), and running simulations in parallel on a cluster can help alleviate the computational time. On the other hand, several complex scenarios that interest evolutionary biologists are still difficult to simulate. An example is the simulation of molecular evolution with dependence among sites (coevolving sites, e.g., [61]). Here, although some models were already developed (see [62]), they could not be extensively applied in simulations due to intractable computational costs derived from the calculation of diverse structural energies (like those used in [63]). Another challenging scenario is the simulation of coding data under natural selection, but where the signatures of natural selection directly influence the synonymous and nonsynonymous substitutions (see [64]).

There is a permanent need of software for the simulation of molecular data due to the emergence of complex scenarios and the requirement of fast simulations. Thus, I expect a fruitful future for this basic and applied area of research.

Acknowledgments

I want to thank Vicky Schneider and the Editor of PLoS Computational Biology's Education section for their invitation to contribute with this education article. I also want to thank Isabel Alves, Yang Liu, Rebecca Krebs-Wheaton, and William Fletcher for helpful comments. I thank three anonymous reviewers for their efforts in reviewing this study.

References

  1. 1. Peck SL (2004) Simulation as experiment: a philosophical reassessment for biological modeling. Trends Ecol Evol 19: 530–534.SL Peck2004Simulation as experiment: a philosophical reassessment for biological modeling.Trends Ecol Evol19530534
  2. 2. DeChaine EG, Martin AP (2006) Using coalescent simulations to test the impact of quaternary climate cycles on divergence in an alpine plant-insect association. Evolution 60: 1004–1013.EG DeChaineAP Martin2006Using coalescent simulations to test the impact of quaternary climate cycles on divergence in an alpine plant-insect association.Evolution6010041013
  3. 3. Carvajal-Rodriguez A, Crandall KA, Posada D (2006) Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method. Mol Biol Evol 23: 817–827.A. Carvajal-RodriguezKA CrandallD. Posada2006Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method.Mol Biol Evol23817827
  4. 4. Arenas M, Valiente G, Posada D (2008) Characterization of reticulate networks based on the coalescent with recombination. Mol Biol Evol 25: 2517–2520.M. ArenasG. ValienteD. Posada2008Characterization of reticulate networks based on the coalescent with recombination.Mol Biol Evol2525172520
  5. 5. Westesson O, Holmes I (2009) Accurate detection of recombinant breakpoints in whole-genome alignments. PLoS Comput Biol 5: e1000318.O. WestessonI. Holmes2009Accurate detection of recombinant breakpoints in whole-genome alignments.PLoS Comput Biol5e1000318
  6. 6. Hill WG, Robertson A (1966) The effect of linkage on limits to artificial selection. Genet Res 8: 269–294.WG HillA. Robertson1966The effect of linkage on limits to artificial selection.Genet Res8269294
  7. 7. Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian computation in population genetics. Genetics 162: 2025–2035.MA BeaumontW. ZhangDJ Balding2002Approximate Bayesian computation in population genetics.Genetics16220252035
  8. 8. Arenas M, Posada D (2010) Coalescent simulation of intracodon recombination. Genetics 184: 429–437.M. ArenasD. Posada2010Coalescent simulation of intracodon recombination.Genetics184429437
  9. 9. Ray N, Currat M, Foll M, Excoffier L (2010) SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination. Bioinformatics 26: 2993–2994.N. RayM. CurratM. FollL. Excoffier2010SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination.Bioinformatics2629932994
  10. 10. Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332–1334.L. ExcoffierM. Foll2011fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.Bioinformatics2713321334
  11. 11. Yang Z (2006) Computational molecular evolution. Oxford University Press. Z. Yang2006Computational molecular evolutionOxford University Press
  12. 12. Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26: 1879–1888.W. FletcherZ. Yang2009INDELible: a flexible simulator of biological sequence evolution.Mol Biol Evol2618791888
  13. 13. Carvajal-Rodriguez A (2008) Simulation of genomes: a review. Curr Genomics 9: 155–159.A. Carvajal-Rodriguez2008Simulation of genomes: a review.Curr Genomics9155159
  14. 14. Carvajal-Rodriguez A (2010) Simulation of genes and genomes forward in time. Curr Genomics 11: 58–61.A. Carvajal-Rodriguez2010Simulation of genes and genomes forward in time.Curr Genomics115861
  15. 15. Liu Y, Athanasiadis G, Weale ME (2008) A survey of genetic simulation software for population and epidemiological studies. Hum Genomics 3: 79–86.Y. LiuG. AthanasiadisME Weale2008A survey of genetic simulation software for population and epidemiological studies.Hum Genomics37986
  16. 16. Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13: 110–122.S. HobanG. BertorelleOE Gaggiotti2012Computer simulations: tools for population and evolutionary genetics.Nat Rev Genet13110122
  17. 17. Arenas M, Posada D (2012) Simulation of coding sequence evolution. In: Cannarozzi GM, Schneider A, editors. Codon evolution. Oxford: Oxford University Press. pp. 126–132.M. ArenasD. Posada2012Simulation of coding sequence evolution.GM CannarozziA. SchneiderCodon evolutionOxfordOxford University Press126132
  18. 18. Carvajal-Rodriguez A (2008) GENOMEPOP: a program to simulate genomes in populations. BMC Bioinformatics 9: 223.A. Carvajal-Rodriguez2008GENOMEPOP: a program to simulate genomes in populations.BMC Bioinformatics9223
  19. 19. Hernandez RD (2008) A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24: 2786–2787.RD Hernandez2008A flexible forward simulator for populations subject to selection and demography.Bioinformatics2427862787
  20. 20. Neuenschwander S (2006) AQUASPLATCHE: a program to simulate genetic diversity in populations living in linear habitats. Mol Ecol Notes 6: 583–585.S. Neuenschwander2006AQUASPLATCHE: a program to simulate genetic diversity in populations living in linear habitats.Mol Ecol Notes6583585
  21. 21. Peng B, Kimmel M (2005) simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21: 3686–3687.B. PengM. Kimmel2005simuPOP: a forward-time population genetics simulation environment.Bioinformatics2136863687
  22. 22. Arbiza L, Patricio M, Dopazo H, Posada D (2011) Genome-wide heterogeneity of nucleotide substitution model fit. Genome Biol Evol 3: 896–908.L. ArbizaM. PatricioH. DopazoD. Posada2011Genome-wide heterogeneity of nucleotide substitution model fit.Genome Biol Evol3896908
  23. 23. Anisimova M, Nielsen R, Yang Z (2003) Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164: 1229–1236.M. AnisimovaR. NielsenZ. Yang2003Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.Genetics16412291236
  24. 24. Arenas M, Posada D (2007) Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 8: 458.M. ArenasD. Posada2007Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography.BMC Bioinformatics8458
  25. 25. Navascues M, Depaulis F, Emerson BC (2010) Combining contemporary and ancient DNA in population genetic and phylogeographical studies. Mol Ecol Resour 10: 760–772.M. NavascuesF. DepaulisBC Emerson2010Combining contemporary and ancient DNA in population genetic and phylogeographical studies.Mol Ecol Resour10760772
  26. 26. Rambaut A, Grassly NC (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosciences 13: 235–238.A. RambautNC Grassly1997Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.Comput Appl Biosciences13235238
  27. 27. Strope CL, Abel K, Scott SD, Moriyama EN (2009) Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0. Mol Biol Evol 26: 2581–2593.CL StropeK. AbelSD ScottEN Moriyama2009Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.Mol Biol Evol2625812593
  28. 28. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13: 555–556.Z. Yang1997PAML: a program package for phylogenetic analysis by maximum likelihood.Computer Applications in the Biosciences13555556
  29. 29. Sipos B, Massingham T, Jordan GE, Goldman N (2011) PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment. BMC Bioinformatics 12: 104.B. SiposT. MassinghamGE JordanN. Goldman2011PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment.BMC Bioinformatics12104
  30. 30. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 169: 299–314.R. IhakaR. Gentleman1996R: a language for data analysis and graphics.J Comput Graph Stat169299314
  31. 31. Biswas S, Akey J (2006) Genomic insights into positive selection. Trends Genet 22: 437–446.S. BiswasJ. Akey2006Genomic insights into positive selection.Trends Genet22437446
  32. 32. Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM (2006) Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 16: 980–989.JL KelleyJ. MadeoyJC CalhounW. SwansonJM Akey2006Genomic signatures of positive selection in humans and the limits of outlier approaches.Genome Res16980989
  33. 33. Ewing G, Hermisson J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065.G. EwingJ. Hermisson2010MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus.Bioinformatics2620642065
  34. 34. Spencer CC, Coop G (2004) SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics 20: 3673–3675.CC SpencerG. Coop2004SelSim: a program to simulate population genetic data with natural selection and recombination.Bioinformatics2036733675
  35. 35. Arenas M, Posada D (2010) The effect of recombination on the reconstruction of ancestral sequences. Genetics 184: 1133–1139.M. ArenasD. Posada2010The effect of recombination on the reconstruction of ancestral sequences.Genetics18411331139
  36. 36. Lemey P, Lott M, Martin DP, Moulton V (2009) Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning. BMC Bioinformatics 10: 126.P. LemeyM. LottDP MartinV. Moulton2009Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning.BMC Bioinformatics10126
  37. 37. Durbin RM, Altshuler DL, Abecasis GR, Bentley DR, Chakravarti A, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.RM DurbinDL AltshulerGR AbecasisDR BentleyA. Chakravarti2010A map of human genome variation from population-scale sequencing.Nature46710611073
  38. 38. Marjoram P, Wall JD (2006) Fast “coalescent” simulation. BMC Genet 7: 16.P. MarjoramJD Wall2006Fast “coalescent” simulation.BMC Genet716
  39. 39. McVean GA, Cardin NJ (2005) Approximating the coalescent with recombination. Philos Trans R Soc Lond B Biol Sci 360: 1387–1393.GA McVeanNJ Cardin2005Approximating the coalescent with recombination.Philos Trans R Soc Lond B Biol Sci36013871393
  40. 40. Excoffier L, Foll M, Petit RJ (2009) Genetic consequences of range expansions. Annu Rev Ecol Evol Syst 40: 481–501.L. ExcoffierM. FollRJ Petit2009Genetic consequences of range expansions.Annu Rev Ecol Evol Syst40481501
  41. 41. Arenas M, Ray N, Currat M, Excoffier L (2012) Consequences of range contractions and range shifts on molecular diversity. Mol Biol Evol 29: 207–218.M. ArenasN. RayM. CurratL. Excoffier2012Consequences of range contractions and range shifts on molecular diversity.Mol Biol Evol29207218
  42. 42. Ray N, Excoffier L (2010) A first step towards inferring levels of long-distance dispersal during past expansions. Mol Ecol Resour 10: 902–914.N. RayL. Excoffier2010A first step towards inferring levels of long-distance dispersal during past expansions.Mol Ecol Resour10902914
  43. 43. Schierup MH, Hein J (2000) Consequences of recombination on traditional phylogenetic analysis. Genetics 156: 879–891.MH SchierupJ. Hein2000Consequences of recombination on traditional phylogenetic analysis.Genetics156879891
  44. 44. Arenas M, Posada D (2010) Computational design of centralized HIV-1 genes. Curr HIV Res 8: 613–621.M. ArenasD. Posada2010Computational design of centralized HIV-1 genes.Curr HIV Res8613621
  45. 45. Bozek K, Lengauer T (2010) Positive selection of HIV host factors and the evolution of lentivirus genes. BMC Evol Biol 10: 186.K. BozekT. Lengauer2010Positive selection of HIV host factors and the evolution of lentivirus genes.BMC Evol Biol10186
  46. 46. Yang Z, Nielsen R, Goldman N, Pedersen A-MK (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431–449.Z. YangR. NielsenN. GoldmanA-MK Pedersen2000Codon-substitution models for heterogeneous selection pressure at amino acid sites.Genetics155431449
  47. 47. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53: 793–808.D. PosadaTR Buckley2004Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests.Syst Biol53793808
  48. 48. Sullivan J, Joyce P (2005) Model selection in phylogenetics. Annu Rev Ecol Evol Syst 36: 445–466.J. SullivanP. Joyce2005Model selection in phylogenetics.Annu Rev Ecol Evol Syst36445466
  49. 49. Luo A, Qiao H, Zhang Y, Shi W, Ho SY, et al. (2010) Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evol Biol 10: 242.A. LuoH. QiaoY. ZhangW. ShiSY Ho2010Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets.BMC Evol Biol10242
  50. 50. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696–704.S. GuindonO. Gascuel2003A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.Syst Biol52696704
  51. 51. Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biol Evol 11: 459–468.MK KuhnerJ. Felsenstein1994A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.Molecular Biol Evol11459468
  52. 52. Posada D (2002) Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol 19: 708–717.D. Posada2002Evaluation of methods for detecting recombination from DNA sequences: empirical data.Mol Biol Evol19708717
  53. 53. Novembre J, Stephens M (2008) Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40: 646–649.J. NovembreM. Stephens2008Interpreting principal component analyses of spatial population genetic variation.Nat Genet40646649
  54. 54. Novembre J, Stephens M (2010) Response to Cavalli-Sforza interview [Human Biology 82(3):245–266 (June 2010)]. Hum Biol 82: 469–470.J. NovembreM. Stephens2010Response to Cavalli-Sforza interview [Human Biology 82(3):245–266 (June 2010)].Hum Biol82469470
  55. 55. François O, Currat M, Ray N, Han E, Excoffier L, et al. (2010) Principal component analysis under population genetic models of range expansion and admixture. Mol Biol Evol 27: 1257–1268.O. FrançoisM. CurratN. RayE. HanL. Excoffier2010Principal component analysis under population genetic models of range expansion and admixture.Mol Biol Evol2712571268
  56. 56. Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and geography of human genes. Princeton, New Jersey: Princeton University Press. LL Cavalli-SforzaP. MenozziA. Piazza1994The history and geography of human genesPrinceton, New JerseyPrinceton University Press
  57. 57. Excoffier L, Ray N (2008) Surfing during population expansions promotes genetic revolutions and structuration. Trends Ecol Evol 23: 347–351.L. ExcoffierN. Ray2008Surfing during population expansions promotes genetic revolutions and structuration.Trends Ecol Evol23347351
  58. 58. Beaumont MA (2010) Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst 41: 379–405.MA Beaumont2010Approximate Bayesian computation in evolution and ecology.Annu Rev Ecol Evol Syst41379405
  59. 59. Wilson DJ, Gabriel E, Leatherbarrow AJ, Cheesbrough J, Gee S, et al. (2009) Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni. Mol Biol Evol 26: 385–397.DJ WilsonE. GabrielAJ LeatherbarrowJ. CheesbroughS. Gee2009Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni.Mol Biol Evol26385397
  60. 60. Laval G, Patin E, Barreiro LB, Quintana-Murci L (2010) Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS ONE 5: e10284.G. LavalE. PatinLB BarreiroL. Quintana-Murci2010Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions.PLoS ONE5e10284
  61. 61. Wang M, Kapralov MV, Anisimova M (2011) Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco. BMC Evol Biol 11: 266.M. WangMV KapralovM. Anisimova2011Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco.BMC Evol Biol11266
  62. 62. Bastolla U, Porto M, Roman HE, Vendruscolo M (2007) Structural approaches to sequence evolution. Berlin, Heidelberg: Springer. U. BastollaM. PortoHE RomanM. Vendruscolo2007Structural approaches to sequence evolutionBerlin, HeidelbergSpringer
  63. 63. Arenas M, Villaverde MC, Sussman F (2009) Prediction and analysis of binding affinities for chemically diverse HIV-1 PR inhibitors by the modified SAFE_p approach. J Comput Chem 30: 1229–1240.M. ArenasMC VillaverdeF. Sussman2009Prediction and analysis of binding affinities for chemically diverse HIV-1 PR inhibitors by the modified SAFE_p approach.J Comput Chem3012291240
  64. 64. Kryazhimskiy S, Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4: e1000304.S. KryazhimskiyJB Plotkin2008The population genetics of dN/dS.PLoS Genet4e1000304
  65. 65. Excoffier L, Novembre J, Schneider S (2000) SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. J Heredity 91: 506–509.L. ExcoffierJ. NovembreS. Schneider2000SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography.J Heredity91506509
  66. 66. Anderson CN, Ramakrishnan U, Chan YL, Hadly EA (2005) Serial SimCoal: a population genetics model for data from multiple populations and points in time. Bioinformatics 21: 1733–1734.CN AndersonU. RamakrishnanYL ChanEA Hadly2005Serial SimCoal: a population genetics model for data from multiple populations and points in time.Bioinformatics2117331734
  67. 67. Ramos-Onsins SE, Mitchell-Olds T (2007) Mlcoalsim: multilocus coalescent simulations. Evol Bioinform Online 3: 41–44.SE Ramos-OnsinsT. Mitchell-Olds2007Mlcoalsim: multilocus coalescent simulations.Evol Bioinform Online34144
  68. 68. Grassly NC, Harvey PH, Holmes EC (1999) Population dynamics of HIV-1 inferred from gene sequences. Genetics 151: 427–438.NC GrasslyPH HarveyEC Holmes1999Population dynamics of HIV-1 inferred from gene sequences.Genetics151427438
  69. 69. Beiko RG, Charlebois RL (2007) A simulation test bed for hypotheses of genome evolution. Bioinformatics 23: 825–831.RG BeikoRL Charlebois2007A simulation test bed for hypotheses of genome evolution.Bioinformatics23825831
  70. 70. Hall BG (2008) Simulating DNA coding sequence evolution with EvolveAGene 3. Mol Biol Evol 25: 688–695.BG Hall2008Simulating DNA coding sequence evolution with EvolveAGene 3.Mol Biol Evol25688695
  71. 71. Cartwright RA (2005) DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21: Suppl 3iii31–38.RA Cartwright2005DNA assembly with gaps (Dawg): simulating sequence evolution.Bioinformatics21Suppl 3iii3138
  72. 72. Rosenberg MS (2005) MySSP: Non-stationary evolutionary sequence simulation, including indels. Evol Bioinform Online 1: 81–83.MS Rosenberg2005MySSP: Non-stationary evolutionary sequence simulation, including indels.Evol Bioinform Online18183
  73. 73. Gesell T, von Haeseler A (2006) In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics 22: 716–722.T. GesellA. von Haeseler2006In silico sequence evolution with site-specific interactions along phylogenetic trees.Bioinformatics22716722
  74. 74. Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families. Bioinformatics 14: 157–163.J. StoyeD. EversF. Meyer1998Rose: generating sequence families.Bioinformatics14157163
  75. 75. Varadarajan A, Bradley RK, Holmes IH (2008) Tools for simulating evolution of aligned genomic regions with integrated parameter estimation. Genome Biol 9: R147.A. VaradarajanRK BradleyIH Holmes2008Tools for simulating evolution of aligned genomic regions with integrated parameter estimation.Genome Biol9R147
  76. 76. Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C (2012) ALF–a simulation framework for genome evolution. Mol Biol Evol 29: 1115–1123.DA DalquenM. AnisimovaGH GonnetC. Dessimoz2012ALF–a simulation framework for genome evolution.Mol Biol Evol2911151123
  77. 77. Pang A, Smith AD, Nuin PA, Tillier ER (2005) SIMPROT: using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinformatics 6: 236.A. PangAD SmithPA NuinER Tillier2005SIMPROT: using an empirically determined indel distribution in simulations of protein evolution.BMC Bioinformatics6236
  78. 78. Arenas M, Patricio M, Posada D, Valiente G (2010) Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 11: 268.M. ArenasM. PatricioD. PosadaG. Valiente2010Characterization of phylogenetic networks with NetTest.BMC Bioinformatics11268
  79. 79. Raup DM, Gould SJ, Schopf TJM, Simberloff DS (1973) Stochastic models of phylogeny and the evolution of diversity. J Geol 81: 525–542.DM RaupSJ GouldTJM SchopfDS Simberloff1973Stochastic models of phylogeny and the evolution of diversity.J Geol81525542
  80. 80. Epperson BK, McRae BH, Scribner K, Cushman SA, Rosenberg MS, et al. (2010) Utility of computer simulations in landscape genetics. Mol Ecol 19: 3549–3564.BK EppersonBH McRaeK. ScribnerSA CushmanMS Rosenberg2010Utility of computer simulations in landscape genetics.Mol Ecol1935493564
  81. 81. Peng B, Amos CI, Kimmel M (2007) Forward-time simulations of human populations with complex diseases. PLoS Genet 3: e47.B. PengCI AmosM. Kimmel2007Forward-time simulations of human populations with complex diseases.PLoS Genet3e47
  82. 82. Calafell F, Grigorenko EL, Chikanian AA, Kidd KK (2001) Haplotype evolution and linkage disequilibrium: a simulation study. Hum Hered 51: 85–96.F. CalafellEL GrigorenkoAA ChikanianKK Kidd2001Haplotype evolution and linkage disequilibrium: a simulation study.Hum Hered518596
  83. 83. Jones TC, Laughlin TF (2010) PopGen fishbowl: a free online simulation model of microevolutionary processes. Am Biol Teach 72: 100–103.TC JonesTF Laughlin2010PopGen fishbowl: a free online simulation model of microevolutionary processes.Am Biol Teach72100103
  84. 84. Coombs JA, Letcher BH, Nislow KH (2010) Pedagog: software for simulating eco-evolutionary population dynamics. Mol Ecol Resour 10: 558–563.JA CoombsBH LetcherKH Nislow2010Pedagog: software for simulating eco-evolutionary population dynamics.Mol Ecol Resour10558563
  85. 85. Padhukasahasram B, Marjoram P, Wall JD, Bustamante CD, Nordborg M (2008) Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178: 2417–2427.B. PadhukasahasramP. MarjoramJD WallCD BustamanteM. Nordborg2008Exploring population genetic models with recombination using efficient forward-time simulations.Genetics17824172427
  86. 86. Nordborg M (2007) Coalescent theory. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of statistical genetics. Third edition. Chichester, UK: John Wiley & Sons, Ltd. pp. 843–877.M. Nordborg2007Coalescent theory.DJ BaldingM. BishopC. CanningsHandbook of statistical genetics. Third editionChichester, UKJohn Wiley & Sons, Ltd843877
  87. 87. Wakeley J (2008) Coalescent Theory: An Introduction. Greenwood Village, Colorado: Roberts and Company Publishers. J. Wakeley2008Coalescent Theory: An IntroductionGreenwood Village, ColoradoRoberts and Company Publishers
  88. 88. Slatkin M (2001) Simulating genealogies of selected alleles in a population of variable size. Genet Res 78: 49–57.M. Slatkin2001Simulating genealogies of selected alleles in a population of variable size.Genet Res784957
  89. 89. Hudson RR (1998) Island models and the coalescent process. Mol Ecol 7: 413–418.RR Hudson1998Island models and the coalescent process.Mol Ecol7413418
  90. 90. Hudson RR (1983) Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 23: 183–201.RR Hudson1983Properties of a neutral allele model with intragenic recombination.Theor Popul Biol23183201
  91. 91. Hudson RR, Kaplan NL (1988) The coalescent process in models with selection and recombination. Genetics 120: 831–840.RR HudsonNL Kaplan1988The coalescent process in models with selection and recombination.Genetics120831840