To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.
The living elephants are the last survivors of a once highly successful mammalian order, the Proboscidea, which includes extinct species such as the iconic woolly mammoth (Mammuthus primigenius) and the American mastodon (Mammut americanum). Despite numerous studies, the phylogenetic relationships of the modern elephants to the woolly mammoth, as well as the taxonomic status of the African elephants of the genus Loxodonta, remain controversial. This is in large part due to the fact that both the woolly mammoth and the American mastodon (the closest outgroup to elephants and mammoths available for genetic studies) are extinct, posing considerable technical hurdles for comparative genetic analysis. We have used a combination of modern DNA sequencing and targeted PCR amplification to obtain a large data set for comparing American mastodon, woolly mammoth, Asian elephant, African savanna elephant, and African forest elephant. We unequivocally establish that the Asian elephant is the sister species to the woolly mammoth. A surprising finding from our study is that the divergence of African savanna and forest elephants—which some have argued to be two populations of the same species—is about as ancient as the divergence of Asian elephants and mammoths. Given their ancient divergence, we conclude that African savanna and forest elephants should be classified as two distinct species.
Citation: Rohland N, Reich D, Mallick S, Meyer M, Green RE, et al. (2010) Genomic DNA Sequences from Mastodon and Woolly Mammoth Reveal Deep Speciation of Forest and Savanna Elephants. PLoS Biol 8(12): e1000564. doi:10.1371/journal.pbio.1000564
Academic Editor: David Penny, Massey University, New Zealand
Received: April 21, 2010; Accepted: November 3, 2010; Published: December 21, 2010
Copyright: © 2010 Rohland et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was funded by the Max Planck Society (NR and MH) and by a Burroughs Wellcome Career Development Award in the Biomedical Science and SPARC award from the Broad Institute to DR. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: ILS, incomplete lineage sorting; IM, Isolation and Migration; mtDNA, mitochondrial DNA; Mya, million years ago
The technology for sequencing DNA from extinct species such as mastodons (genus Mammut) and mammoths (genus Mammuthus) provides a powerful tool for elucidating the phylogeny of the Elephantidae, a family that originated in the Miocene and that includes Asian elephants (genus Elephas), African elephants (genus Loxodonta), and extinct mammoths –. In the highest resolution study to date, complete mitochondrial DNA (mtDNA) genomes from three elephantid genera were compared to the mastodon outgroup. The mtDNA analysis suggested that mammoths and Asian elephants form a clade with an estimated genetic divergence time of 5.8–7.8 million years ago (Mya), while African elephants diverged from an earlier common ancestor 6.6–8.8 Mya . However, mtDNA represents just a single locus in the genome and need not represent the true species phylogeny since a single gene tree can differ from the consensus species tree of the taxa in question –. Generalizing about species relationships based on mtDNA alone is especially problematic for the Elephantidae because their core social groups (“herds”) are matrilocal, with females rarely, if ever, dispersing across groups . This results in mtDNA genealogies in both African , and Asian elephants  that exhibit deeper divergence and/or different phylogeographic patterns than the nuclear genome.
These observed discrepancies between the phylogeographic patterns of nuclear and mtDNA sequences have led to a debate about the appropriate taxonomic status of African elephants. Most researchers have argued, based on morphology and nuclear DNA markers, that forest (Loxodonta cyclotis) and savanna (Loxodonta africana) elephants should be considered separate species ,–. However, this notion has been contested  based on mtDNA patterns, which reveal some haplogroups with coalescent times of less than half a million years  that are shared across forest and savanna elephants, indicating relatively recent gene flow among the ancestors of these taxa. Taxonomies for African elephants based on mtDNA phylogeographic patterns have suggested anywhere from one to four species ,,, whereas analysis of morphology and nuclear data sets has suggested two species ,–.
The study of large amounts of nuclear DNA sequences has the potential to resolve elephantid phylogeny, but due to technical challenges associated with obtaining homologous data sets from fossil DNA, no sufficiently large nuclear DNA data set has been published to date. Although a draft genome is available for woolly mammoth (Mammuthus primigenius)  and savanna elephant (loxAfr; http://www.broadinstitute.org/ftp/pub/assemblies/mammals/elephant/), comparative sequence data are lacking for Asian (Elephas maximus) and forest elephant, as well as for a suitable outgroup like the American mastodon (Mammut americanum). Using a combination of next generation sequencing and targeted multiplex PCR, we obtained the first substantial nuclear data set for comparing these species.
We carried out shotgun sequencing of DNA from an American mastodon with a Roche 454 Genome Sequencer (GS), using the same DNA extract from a 50,000–130,000-yr-old tooth that we previously used to generate a complete mtDNA genome sequence from the mastodon . After comparing the 45 Mb of shotgun DNA data that we obtained to the Genbank database, and only retaining reads for which the best match was to sequences of the savanna elephant draft sequence (loxAfr1), we were left with 1.76 Mb of mastodon sequence (Figure 1 and Figure S1).
(a) Mastodon shotgun 454 sequencing. We ligated 454-adaptors (green and blue) to the ends of the DNA molecules and sequenced the libraries on a Roche 454 GS. (b) Bioinformatic analysis of shotgun 454 sequences. To identify proboscidean sequence, we compared the sequences to databases consisting of the savanna elephant draft genome (loxAfr1), the human genome (hg18), the mouse genome (mm8), NCBI's nucleotide database of environmental samples (env), and NCBI's non-redundant nucleotide database (nr). The 454 sequences with a best match to loxAfr1 (in red) were aligned to loxAfr1. Alignments of at least 90 bp in length and with a similarity higher than 87% were used for primer design after filtering out known repeat elements (using the UCSC RepeatMasker database). Primers were based on loxAfr1 sequence flanking the mastodon sequence. (c) Multiplex PCR and sequencing of the targeted loci in modern elephants and mammoth. We show the protocol for the first of four rounds of the project (Table S3 provides details of the further rounds). A total of 213 primer pairs were randomly divided into 5 multiplex primer mixes with 41–44 primer pairs per mix. These mixes were used for the first step of the two-step multiplex PCR approach, for each of the 5 samples (La, Loxodonta africana; Lc, L. cyclotis; Em 1, Elephas maximus 1; Em 2, E. maximus 2; Mp, Mammuthus primigenius). Dilutions of these products were used as templates to amplify the loci individually in the second step (shown for L. africana), resulting in 213 distinct products per sample. These products were quantified, normalized, and merged into one pool per sample. A 454 library was prepared and sequenced on 1/16th of a picotiter plate of a Roche 454 GS.
To amplify the same set of loci across all species, we designed PCR primers flanking the regions of mastodon-elephant alignment, using the loxAfr1 savanna elephant sequence as a template (Figure 1) (a full list of the primers is presented in Dataset S1). We used these primers in a multiplexed protocol  to amplify one or two Asian elephants, one African forest elephant, one woolly mammoth, and one African savanna elephant unrelated to the individual used for the reference sequence (Figure 1 and Table S1). We then sequenced the products on a Roche 454 GS to a median coverage of 41-fold and assembled a consensus sequence for each individual by restricting to nucleotides with at least 3-fold coverage. After four rounds of amplification and sequencing, we obtained 39,763 base pairs across 375 loci with data from all five taxa (Text S1; Figure S2; Table S2, Table S3). We identified 1,797 nucleotides in this data set in which two different alleles were observed and used these sites for the majority of our analyses (the genotypes are provided in Dataset S2). A total of 549 of these biallelic sites were polymorphic among the elephantids, while the remaining sites were fixed differences compared to the mastodon sequence.
To assess the utility of the data for molecular dating and inference about demographic history, we carried out a series of relative rate tests, searching for an excess of divergent sites in one taxon compared to another since their split, which could reflect sequencing errors or changes in the molecular clock . None of the pairs of taxa showed a significant excess of divergent sites compared with any other (Table 1). When we compared the data within taxa, we found that the savanna reference genome loxAfr1 had a significantly higher number of lineage-specific substitutions than the savanna elephant we sequenced (nominal P = 0.03 from a two-sided test without correcting for multiple hypothesis testing). This is consistent with our data being of higher quality than the loxAfr1 reference sequence, presumably due to our high read coverage.
In contrast to our elephantid data, our mastodon data had a high error rate, as expected given that it was derived from shotgun sequencing data providing only 1-fold coverage at each position. To better understand the effect of errors in the mastodon sequence, we PCR-amplified a subset of loci in the mastodon, obtaining high-quality mastodon data at 1,726 bases (Text S2). Of the n = 23 sites overlapping these bases that we knew were polymorphic among the elephantids, the mastodon allele call always agreed between the PCR and shotgun data, indicating that our mastodon data are reliable for the purpose of determining an ancestral allele (the main purpose for which we use the mastodon data). However, only 38% of mastodon-elephantid divergent sites validated, which we ascribe to mastodon-specific errors, since almost all the discrepancies were consistent with C/G-to-T/A misincorporations (the most prominent error in ancient DNA) –, or mismapping of some of the short mastodon reads (2). Thus, our raw estimate of mastodon-elephantid divergence is too high, making it inappropriate to use mastodon for calibrating genetic divergences among the elephantids, as we previously did for mtDNA where we had high-quality mastodon data .
Genetic Diversity and Phylogenetic Relationships among Elephantid Taxa
We estimated the relative genetic diversity across elephantids by counting the total number of heterozygous genotypes in each taxon, and normalizing by the total number of sites differing between (S)avanna and (A)sian elephants (tSA). Within-species genetic diversity as a fraction of savanna-Asian divergence is estimated to be similar for savanna elephants (8±2%) and mammoths (9±2%), higher for Asian elephants (15±3%), and much higher for forest elephants (30±4%) (standard errors from a Weighted Jackknife; Methods). This supports previous findings of a higher average time to the most recent common genetic ancestor in forest compared to savanna elephants (Table 1) ,. We caution that these diversity estimates are based on analyzing only a single individual from each taxon, which could produce a too-low estimate of diversity in the context of recent inbreeding. Encouragingly, however, in Asian elephants where two individuals were sequenced for some loci, genetic diversity estimates are consistent whether measured across (18±5%) or within samples (15±3%). A further potential concern is “allele specific PCR”, whereby one allele is preferentially amplified causing truly heterozygous sites to go undetected . However, we do not believe that this is a concern since we preformed an experiment in which we re-amplified about 5% of our loci using different primers and obtained identical genotypes at all sites where we had overlapping data (Text S2).
We next inferred a nuclear phylogeny for the elephantids using the Neighbor Joining method (Methods and Figure S3). This analysis suggests that mammoths and Asian elephants are sister taxa, consistent with the mtDNA phylogeny , and that forest and savanna elephants are also sister taxa. We estimate that forest-savanna genetic divergence normalized by savanna-Asian is tFS/tSA = 74±6%, while Asian-mammoth genetic divergence normalized by savanna-Asian tAM/tSA = 65±5% (Table 1). These numbers are all significantly lower than savanna-mammoth (tSM/tSA = 92±5%), forest-Asian (tFA/tSA = 103±5%), and forest-mammoth (tFM/tSA = 96±7%) normalized by savanna-Asian genetic divergence, which are all consistent with 100% as expected if they reflect the same comparison across sister groups (Table 1).
An intriguing observation is that the ratio of forest-savanna elephant genetic divergence to Asian-mammoth divergence tFS/tAM is consistent with unity (90% credible interval 90%–138%), which is interesting given that forest and savanna elephants are sometimes classified as the same species, whereas Asian elephants and mammoth are classified as different genera ,. To further explore this issue, we focused on regions of the genome where the genealogical tree is inconsistent with the species phylogeny, a phenomenon known as “incomplete lineage sorting” (ILS) ,,. Information about the rate of ILS can be gleaned from the rate at which alleles are observed that cluster taxa that are not most closely related according to the overall phylogeny. For example, in a four-taxon alignment of (S)avanna, (F)orest, (E)urasian, and mastodon, “SE” and “FE” alleles that cluster savanna-Eurasian or forest-Eurasian, to the exclusion of the other taxa, are likely to be at loci with ILS (in what follows, we use the term “Eurasian elephants” to refer to woolly mammoths and Asian elephants, while recognizing that the range of the lineage ancestral to each species included Africa as well). Similarly, in a four-taxon alignment of (A)sian, (M)ammoth, (L)oxodonta (forest plus savanna), and mastodon, “AL” or “ML” sites reveal probable ILS events. We find a higher rate of inferred ILS in forest and savanna elephants than in Asian elephants and mammoths: (FE+SE)/(AL+ML) = 3.1 (P = 4×10−8 for exceeding unity; Table 2), indicating that there are more lineages where savanna and forest elephants are unrelated back to the African-Eurasian speciation than is the case for Asian elephants and mammoths (Table 2). This could reflect a history in which the savanna-forest population divergence time TFS is older than the Asian-mammoth divergence time TAM, a larger population size ancestral to the African than to the Eurasian elephants, or a long period of gene flow between two incipient taxa. (We use upper case “T” to indicate population divergence time and lower case “t” to indicate average genetic divergence time (t≥T)).
Fitting a Model of Population History to the Data
To further understand the history of the elephantids, we fit a population genetic model to the data (input file—Dataset S3) using the MCMCcoal (Markov Chain Monte Carlo coalescent) method of Yang and Rannala . We fit a model in which the populations split instantaneously at times ΤFS (forest-savanna), ΤAM (Asian-mammoth), ΤLox-Eur (African-Eurasian), and ΤElephantid-Mastodon, with constant population sizes ancestral to these speciation events of ΝFS, ΝAM, ΝLox-Eur, and ΝElephantid-Mastodon, and (after the final divergences) of ΝF, ΝS, ΝA, and ΝM (Figure 2). We recognize that elephantid population sizes likely varied within these time intervals, given recurrent glacial cycles , changes in geographic ranges documented in the fossil record ,,,, and mtDNA patterns suggesting ancient population substructure ,. Nevertheless, the constant population size assumption is useful for inferring average diversity and obtaining an initial picture of elephantid history. MCMCcoal then makes the further simplifying assumptions that our short (average 106 bp) loci experienced no recombination and that they are unlinked (the latter assumption is justified by the fact that when we mapped the loci to scaffolds from the loxAfr3 genome sequence, all but one pair were at least 100 kilobases apart; Text S3). MCMCcoal then infers the joint distribution of the “T” and “N” parameters that is consistent with the data, as well as the associated credible intervals (Table 3; Text S4).
Demographic model that is fit by MCMCcoal, in which all population splits are instantaneous (without subsequent gene flow), and all population sizes are assumed to be constant over intervals. Here, TFS refers to forest-savanna elephant population divergence time, TAM refers to Asian elephant-mammoth population divergence time, TLox-Eur refers to African-Eurasian population divergence time, and TElephantid-Mastodon refers to elephantid-mastodon population divergence time, presented here in millions of years. The Ν quantities refer to constant diploid effective population sizes ancestral to each of these splits (in thousands). For obtaining estimates of years and population sizes, we assume that the elephantids have an average of 31 years per generation, based on estimates of 17–20 years for females , and 40–49 years for males ,. A lower or higher number of years per generation would produce a proportionate effect on the population size estimates. For each parameter, two sets of numbers are shown. The upper set shows the range consistent with the fossil record, calibrating to an assumed African-Eurasian population split of TLox-Eur = 4.2–9 Mya (justified in Text S5). For example for forest-savanna population divergence, this leads to TFS = 2.6–5.6 Mya given that MCMCcoal estimates TFS/TLox-Eur = 62%. The lower set of numbers (in parentheses) provides MCMCcoal's 90% credible interval for the parameters as a fraction of the best estimate (e.g. 76%–126% for TFS). In the main text, we conservatively quote a range that combines the uncertainty from the fossil record and from MCMCcoal (e.g. TFS = 1.9–7.1 Mya).
The MCMCcoal analysis infers that the initial divergence of forest and savanna elephant ancestors occurred at least a couple of Mya. The first line of evidence for this is that forest-savanna elephant population divergence time is estimated to be comparable to that of Asian elephants and mammoths: ΤAM/ΤFS = 0.96 (0.69−1.36) (Table 4). Secondly, MCMCcoal infers that the ratio of forest-savanna to African-Eurasian elephant population divergence is at least 45%: ΤFS/ΤLox-Eur = 0.62 (0.45−0.79) (Table 4). Given that African-Eurasian genetic divergence (TLox-Eur) can be inferred from the fossil record to have occurred 4.2–9.0 Mya (Text S5), this allows us to conclude that forest-savanna divergence occurred at least 1.9 Mya (4.2 Mya × 0.45). We caution that because MCMCcoal fits a model of instantaneous population divergence, our results do not rule out some forest-savanna gene flow having occurred more recently, as indeed must have occurred based on the mtDNA haplogroup that is shared among some forest and savanna elephants. However, such gene flow would mean that the initial population divergence must have been even older to explain the patterns we observe.
We also used the MCMCcoal results to learn more about the timing of the divergences among the elephantids (Figure 2). To be conservative, we quote intervals that take into account the full range of uncertainty from both the fossil calibration of African-Eurasian population divergence (TLox-Eur = 4.2–9.0 Mya; Text S5), and the 90% credible intervals from MCMCcoal (TFS/TLox-Eur = 45%–79% and TAM/TLox-Eur = 46%–74%; Table 4). Thus, we conservatively estimate TFS = 1.9–7.1 Mya and TAM = 1.9–6.7 Mya. Our inference of TAM is somewhat less than the mtDNA estimate of genetic divergence of 5.8–7.8 Mya . However, this is expected, since genetic divergence time is guaranteed to be at least as old as population divergence but may be much older, especially as deep-rooting mtDNA lineages are empirically observed to occur in matrilocal elephantid species.
Our study of the extant elephantids provides support for the proposed classification of the Elephantidae by Shoshani and Tassy, which divides them into the tribe Elephantini (including Elephas—the Asian elephant and fossil relatives—and the extinct mammoths Mammuthus) and the tribe Loxodontini (consisting of Loxodonta: African forest and savanna elephants and extinct relatives) . This classification is at odds with previous suggestions that the extinct mammoths may have been more closely related to African than to Asian elephants .
Our study also infers a strikingly deep population divergence time between forest and savanna elephant, supporting morphological and genetic studies that have classified forest and savanna elephants as distinct species ,–. The finding of deep nuclear divergence is important in light of findings from mtDNA, which indicate that the F-haplogroup is shared between some forest and savanna elephants, implying a common maternal ancestor within the last half million years . The incongruent patterns between the nuclear genome and mtDNA (“cytonuclear dissociation”) have been hypothesized to be related to the matrilocal behavior of elephantids, whereby males disperse from core social groups (“herds”) but females do not ,. If forest elephant female herds experienced repeated waves of migration from dominant savanna bulls, displacing more and more of the nuclear gene pool in each wave, this could explain why today there are some savanna herds that have mtDNA that is characteristic of forest elephants but little or no trace of forest DNA in the nuclear genome ,,,. In the future, it may be possible to distinguish between models of a single ancient population split between forest and savanna elephants, or an even older split with longer drawn out gene flow, by applying methods like Isolation and Migration (IM) models to data sets including more individuals . Our present data do not permit such analysis, however, as IM requires multiple samples from each taxon to have statistical power, and we only have 1–2 samples from each taxon.
Our study also documents the highly variable population sizes across recent elephantid taxa and in particular indicates that the recent effective population size of forest elephants in the nuclear genome (NF) has been significantly larger than those of the other elephantids (NS, NA, and NM) (Table 5) ,,. This is not likely due to the “out of Africa” migration of the ancestors of mammoths and Asian elephants as these events occurred several Mya , and any loss of diversity due to founder effects would have been expected to be offset by subsequent accumulation of new mutations in the populations. The high effective population size in forest elephants could reflect a history of separation of populations into distinct isolated tropical forest refugia during glacial cycles , which would have been a mechanism by which ancestral genetic diversity could have been preserved before the population subsequently remixed ,,. A Pleistocene isolation followed by remixing would also be consistent with the patterns observed in Asian elephants, which carry two deep mtDNA clades and where there is intermediate nuclear diversity. Intriguingly, our estimate of recent forest effective population size is on the same order as the ancestral population sizes (NFS, NAM, and NLox-Eur) (Table 5), providing some support for the hypothesis that forest elephant population parameters today may be typical of the ancestral populations (a caveat, however, is that MCMCcoal may overestimate ancestral population sizes since unmodeled sources of variation across loci may inflate estimates of ancestral population size). An alternative hypothesis that seems plausible is that the large differences in intra-species genetic diversity across taxa could reflect differences in the variance of male reproductive success  (more male competition in mammoth and savanna elephant than among forest elephants, with the Asian elephant being intermediate ).
The results of this study are finally intriguing in light of fossil evidence that forest and savanna lineages of Loxodonta may have been geographically isolated until recently. The predominant elephant species in the fossil record of the African savannas for most of the Pliocene and Pleistocene belonged to the genus Elephas ,,. Some authors have suggested that the geographic range of Loxodonta in the African savannas may have been circumscribed by Elephas, until the latter disappeared from Africa towards the Late Pleistocene ,,. We hypothesize that the widespread distribution of Elephas in Africa may have created an isolation barrier that separated savanna and forest elephants, so that gene flow became common only much later, contributing to the patterns observed in mtDNA. Further insight into the dynamics of forest-savanna elephant interaction will be possible once more samples are analyzed from all the taxa, and high-quality whole genome sequences of forest and savanna elephants are available and can be compared with sequences of Asian elephants, mammoths, and mastodons.
For our sequencing of mastodon, we used the same DNA extract that was previously used to generate the complete mitochondrial genome of a mastodon . We sequenced the extract on a Roche 454 GS, resulting in 45 Mb of sequences that we deposited in the NCBI short read archive (accession: SRA010805). By comparing these reads to the African savanna elephant genome (loxAfr1) using MEGABLAST, we identified 1.76 Mb of mastodon sequences with a best hit to loxAfr1 that we then used in downstream analyses.
To re-sequence a subset of these loci in the living elephants and the woolly mammoth, we used Primer3 to design primers surrounding the longest mastodon-African elephant alignments. A two-step multiplex PCR approach  was used to attempt to sequence 746 loci in 1 mammoth, 1 African savanna elephant, 1 African forest elephant, and 1–2 Asian elephants. After the simplex reactions for each sample, the PCR products were pooled in equimolar amounts for each sample and then sequenced on a Roche 454 GS, resulting in an average read coverage of 41× per nucleotide (Text S1). We carried out four rounds of PCR in an attempt to obtain data from as many loci as possible and to fill in data from loci that failed or gave too few sequences in previous rounds (Text S1).
To analyze the data, we sorted the sequences from each sample according to the PCR primers (746 primer pairs in total) and then aligned the reads to the reference genome (loxAfr1), disregarding sequences below 80% identity. Consensus sequences for each locus and each individual were called with the settings described by Stiller and colleagues , with a minimum of three sequences required in order to call a nucleotide and a maximum of three polymorphic positions allowed per locus (to filter out false-positive divergent sites due to paralogous sequences that occur in multiple loci in the genome). We finally generated multiple sequence alignments for each locus and called divergent sites when at least one allele per species was available. In the first experimental round we were not able to call consensus sequences for more than half of the loci, a problem that we found was correlated with primer pairs that had multiple BLAST matches to loxAfr1, suggesting alignment to genomic repeats. Primer pairs for subsequent experimental rounds were excluded if in silico PCR (http://genome.ucsc.edu/cgi-bin/hgPcr) suggested that they could anneal at too many loci in the savanna elephant genome.
Filtering of 22 Divergent Sites That Have a High Probability of Having Arisen Due to Recurrent Mutation
Of the 1,797 biallelic divergent sites that were identified, we removed 22 to produce Tables 1 and 2. The justification for removing these sites is that derived alleles were seen in both African and Eurasian elephants, which is unlikely to be observed in the absence of sequencing errors or recurrent mutation. For the MCMCcoal analysis we did not remove these divergent sites, since the method explicitly models recurrent mutation.
To obtain standard errors, we omitted each of the 375 loci in turn and recomputed the statistic of interest. To compute a normally distributed standard error, we measured the variability of each statistic of interest over all 375 dropped loci, weighted by the number of divergent sites at the locus that had been dropped in order to take account of the variable amount of data across loci. This can be converted into a standard error using the theory of the Weighted Jackknife as described in .
Estimates of Genetic Diversity, Relative Rate Tests, and ILS
For our relative rate tests, we compute the difference in the number of divergent sites between two taxa since they split, normalized by the total number of divergent sites. The number of standard errors (computed from a Weighted Jackknife) by which this differs from zero represents a z score that should be normally distributed under the null hypothesis and thus can be converted into a p value for consistency of the data with equal substitution rates on either lineage.
To prepare a data set for MCMCcoal, we used input files containing the alignments in PHYLIP format (Dataset S3) , restricting analysis to the loci for which we had diploid data from at least one individual from each of the elephantids we resequenced (we did not use data from the loxAfr1 draft savanna genome, or from the second Asian elephant we sequenced at only a small fraction of loci). The diploid data for each taxon were used to create two sequences from each of the elephantids, allowing us to make inferences about effective population size in each taxon since its divergence from the others.
We ran MCMCcoal with the phylogeny ((((Forest1,Forest2), (Savanna1,Savanna2)), ((Asian1,Asian2), (Mammoth1,Mammoth2))) Mastodon). Since MCMCcoal is a Bayesian method, it requires specifying a prior distribution for each parameter; that is, a hypothesis about the range of values that are consistent with previously reported information (such as the fossil record). For the effective population sizes in each taxa (NF, NS, NA, NM, NFS, NAM, NLox-Eur, and NElephantid-Mastodon) we used prior distributions that had their 5th percentile point corresponding to the lowest diversity seen in present-day elephants (savanna) and their 95th percentile point corresponding to the highest diversity seen in elephantids (forest). For the mastodon-elephantid population divergence time TElephantid-Mastodon we used 24–30 Mya ,,–. For the African-Eurasian population divergence time ΤLox-Eur we used 4.2–9 Mya ,,. For the Asian-mammoth population divergence time ΤAM we used 3.0–8.5 Mya ,,. The taxonomic status of forest and savanna elephants is contentious. To allow us to test the hypotheses of both recent and ancient divergence while being minimally affected by the prior distribution, we use an uninformative prior distribution of TFS = 0.5–9 Mya. This prior distribution has substantial density at <1 million years, allowing us to test for recent divergence of forest and savanna elephants. A full justification for the prior distributions is given in Text S5.
MCMCcoal also requires an assumption about the mutation rate, which is poorly measured for the elephantids. We thus ran MCMCcoal under varying assumptions for the mutation rate, to ensure that our key results were stable in the face of uncertainty about this parameter. For each of the three mutation rates that we tested, MCMCcoal was run three times starting from different random number seeds with 4,000 burn-in and 100,000 follow-on iterations. Estimates of all parameters that were important to our inferences were consistent across runs suggesting stability of the inferences despite starting at different random number seeds (we did observe instability for the parameters corresponding to mastodon-elephantid divergence, but this was expected because of the high rate of mastodon errors and is not a problem for our analysis as this divergence is not the focus of this study). We computed the autocorrelation of each sampled parameter over MCMC iterations to assess the stickiness of the MCMC. Parameters appear to be effectively uncorrelated after a lag of 200 iterations. Given that we ran each chain over 100,000 iterations, we expect to have at least 500 independent points from which to sample, which is sufficient to compute 90% credible intervals. The detailed parameter settings and results are presented in Text S4.
All primers used in this study.
(0.27 MB PDF)
Table with polymorphic positions.
(1.49 MB XLS)
Input file (PHYLIP) for MCMCcoal.
(0.10 MB PDF)
Mastodon shotgun results. (a) A histogram of read length (in nucleotides) of all putative mastodon sequences gathered in this study by shotgun sequencing. The longest sequence is 202 nucleotides long, and only the longer sequences (to the right of the black line) were used for primer design. (b) Percent identity of all mastodon-loxAfr1 alignments. The mean percent identity is 95%. Only sequences with an identity of more than 87% (to the right of the black line) were used for primer design.
(0.21 MB DOC)
Analysis of 454-sequence data to build multiple alignments. Sequences were sorted according to their barcode to identify the sample, and then the sequences (now per individual) were further sorted by the 5′-primer and aligned to the reference (loxAfr1) using a similarity threshold of 80%. Consensus sequences were called per individual and consensus sequences of the various individuals were merged into multiple sequence alignments including the mastodon shotgun sequence (red).
(0.14 MB DOC)
A Neighbor Joining tree built with the software MEGA4 supports the topology (((Savanna, Forest),(Asian, Mammoth)), Mastodon).
(0.04 MB DOC)
Samples used in this study.
(0.04 MB DOC)
Summary of loci that we attempted to amplify.
(0.03 MB DOC)
Target performance for different rounds of the experiment.
(0.11 MB DOC)
(0.07 MB DOC)
Error Rate Assessment.
(0.04 MB DOC)
Genomic distribution of loci.
(0.03 MB DOC)
MCMCcoal analysis to infer population parameters.
(0.11 MB DOC)
Justification for prior distributions for MCMCcoal.
(0.06 MB DOC)
We thank K. Prüfer and U. Stenzel for assistance in data analysis, P. Matheus and E. Willerslev for providing fossil samples, W. Sanders for sharing a preprint of his book on African proboscideans and for assistance in providing appropriate references, and R. Querner for help with figure design. We furthermore thank the Vertebrate Biology Group of the Broad Institute of MIT and Harvard and in particular F. Di Palma and K. Lindblad-Toh for sharing the savanna elephant genome data. For access to modern elephant samples, we thank S. J. O'Brien, R. Hanson, M. J. Malasky, and F. Hussain of the Laboratory of Genomic Diversity; A. Turkalo, M. Keele, and D. Olson; B. York and A. Baker at the Burnet Park Zoo, Syracuse, New York; M. Bush at the National Zoological Park, DC; M. Gadd and R. Ruggiero of the U.S. Fish and Wildlife Service; and the governments of the Central African Republic and Tanzania.
The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: DR MM MH. Performed the experiments: NR MM. Analyzed the data: NR DR SM REG. Contributed reagents/materials/analysis tools: DR NJG AR MH. Wrote the paper: NR DR SM AR MH.
- 1. Barnes I, Shapiro B, Lister A, Kuznetsova T, Sher A, et al. (2007) Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr Biol 17: 1072–1075.
- 2. Debruyne R, Chu G, King C. E, Bos K, Kuch M, et al. (2008) Out of America: ancient DNA evidence for a new world origin of late quaternary woolly mammoths. Curr Biol 18: 1320–1326.
- 3. Gilbert M. T, Tomsho L. P, Rendulic S, Packard M, Drautz D. I, et al. (2007) Whole-genome shotgun sequencing of mitochondria from ancient hair shafts. Science 317: 1927–1930.
- 4. Krause J, Dear P. H, Pollack J. L, Slatkin M, Spriggs H, et al. (2006) Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439: 724–727.
- 5. Miller W, Drautz D. I, Ratan A, Pusey B, Qi J, et al. (2008) Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456: 387–390.
- 6. Poinar H. N, Schwarz C, Qi J, Shapiro B, MacPhee R. D. E, et al. (2006) Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311: 392–394.
- 7. Rogaev E. I, Moliaka Y. K, Malyarchuk B. A, Kondrashov F. A, Derenko M. V, et al. (2006) Complete mitochondrial genome and phylogeny of Pleistocene Mammoth Mammuthus primigenius. PLoS Biol 4: e73. doi:10.1371/journal.pbio.0040073.
- 8. Rohland N, Malaspinas A. S, Pollack J. L, Slatkin M, Matheus P, et al. (2007) Proboscidean mitogenomics: chronology and mode of elephant evolution using mastodon as outgroup. PLoS Biol 5: e207. doi:10.1371/journal.pbio.0050207.
- 9. Burgess R, Yang Z (2008) Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol Biol Evol 25: 1979–1994.
- 10. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5: 568–583.
- 11. Roca A. L (2008) The mastodon mitochondrial genome: a mammoth accomplishment. Trends Genet 24: 49–52.
- 12. Wittemyer G, Douglas-Hamilton I, Getz W. M (2005) The socioecology of elephants: analysis of the processes creating multitiered social structures. Animal Behaviour 69: 1357–1371.
- 13. Roca A. L, Georgiadis N, O'Brien S. J (2005) Cytonuclear genomic dissociation in African elephant species. Nat Genet 37: 96–100.
- 14. Lei R, Brenneman R. A, Louis E. E (2008) Genetic diversity in the North American captive African elephant collection. Journal of Zoology 275: 252–267.
- 15. Vidya T. N, Sukumar R, Melnick D. J (2009) Range-wide mtDNA phylogeography yields insights into the origins of Asian elephants. Proc Biol Sci 276: 893–902.
- 16. Grubb P, Groves C. P, Dudley J. P, Shoshani J (2000) Living African elephants belong to two species: Loxodonta africana (Blumenbach, 1797) and Loxodonta cyclotis (Matschie, 1900). Elephant 2: 1–4.
- 17. Roca A. L, Georgiadis N, Pecon-Slattery J, O'Brien S. J (2001) Genetic evidence for two species of elephant in Africa. Science 293: 1473–1477.
- 18. Groves C. P, Grubb P (2000) Do Loxodonta cyclotis and L. africana interbreed? Elephant 2: 4–7.
- 19. Comstock K. E, Georgiadis N, Pecon-Slattery J, Roca A. L, Ostrander E. A, et al. (2002) Patterns of molecular genetic variation among African elephant populations. Mol Ecol 11: 2489–2498.
- 20. Debruyne R (2005) A case study of apparent conflict between molecular phylogenies: the interrelationships of African elephants. Cladistics 21: 31–50.
- 21. Murata Y, Yonezawa T, Kihara I, Kashiwamura T, Sugihara Y, et al. (2009) Chronology of the extant African elephant species and case study of the species identification of the small African elephant with the molecular phylogenetic method. Gene 441: 176–186.
- 22. Johnson M. B, Clifford S. L, Goossens B, Nyakaana S, Curran B, et al. (2007) Complex phylogeographic history of central African forest elephants and its implications for taxonomy. BMC Evol Biol 7: 244.
- 23. Eggert L. S, Rasner C. A, Woodruff D. S (2002) The evolution and phylogeography of the African elephant inferred from mitochondrial DNA sequence and nuclear microsatellite markers. Proc R Soc Lond B Biol Sci 269: 1993–2006.
- 24. Roempler H, Dear P. H, Krause J, Meyer M, Rohland N, et al. (2006) Multiplex amplification of ancient DNA. Nature Protocols 1: 720–728.
- 25. Tajima F (1993) Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135: 599–607.
- 26. Briggs A. W, Stenzel U, Meyer M, Krause J, Kircher M, et al. (2009) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res.
- 27. Hofreiter M, Jaenicke V, Serre D, Haeseler Av A, Paabo S (2001) DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res 29: 4793–4799.
- 28. Briggs A. W, Stenzel U, Johnson P. L, Green R. E, Kelso J, et al. (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A 104: 14616–14621.
- 29. Hoberman R, Dias J, Ge B, Harmsen E, Mayhew M, et al. (2009) A probabilistic approach for SNP discovery in high-throughput human resequencing data. Genome Res 19: 1542–1552.
- 30. Maglio V. J (1973) Origin and evolution of the Elephantidae. Trans Am Phil Soc Philad, New Series 63: 1–149.
- 31. Patterson N, Richter D. J, Gnerre S, Lander E. S, Reich D (2006) Genetic evidence for complex speciation of humans and chimpanzees. Nature 441: 1103–1108.
- 32. Yang Z, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23: 212–226.
- 33. Maley J (1991) The African rain-forest vegetation and paleoenvironments during Late Quaternary. Climatic Change 19: 79–98.
- 34. Kingdon J (1979) East African mammals: an atlas of evolution in Africa. Volume III Part B (Large mammals). London: Academic Press. 436 p.
- 35. Sanders W. J, Gheerbrant E, Harris J. M, Saegusa H, Delmer C (2010) Proboscidea. In: Werdelin L, Sanders W. J, editors. Cenozoic mammals of Africa. Berkeley: University of California Press.
- 36. Shoshani J, Tassy P (2005) Advances in proboscidean taxonomy & classification, anatomy & physiology, and ecology & behavior. Quaternary International 126–28: 5–20.
- 37. Debruyne R, Barriel V, Tassy P (2003) Mitochondrial cytochrome b of the Lyakhov mammoth (Proboscidea, Mammalia): new data and phylogenetic analyses of Elephantidae. Mol Phylogenet Evol 26: 421–434.
- 38. Hoelzer G. A (1997) Inferring phylogenies from mtDNA variation: mitochondrial-gene trees versus nuclear-gene trees revisited. Evolution 51: 622–626.
- 39. Roca A. L, Georgiadis N, O'Brien S. J (2007) Cyto-nuclear genomic dissociation and the African elephant species question. Quaternary International 169–170: 4–16.
- 40. Roca A. L, O'Brien S. J (2005) Genomic inferences from Afrotheria and the evolution of elephants. Curr Opin Genet Dev 15: 652–659.
- 41. Nielsen R, Wakeley J (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158: 885–896.
- 42. Storz J. F, Bhat H. R, Kunz T. H (2001) Genetic consequences of polygyny and social structure in an Indian fruit bat, Cynopterus sphinx. II. Variance in male mating success and effective population size. Evolution 55: 1224–1232.
- 43. Hollister-Smith J. A, Poole J. H, Archie E. A, Vance E. A, Georgiadis N. J, et al. (2007) Age, musth and paternity success in wild male African elephants, Loxodonta africana. Animal Behaviour 74: 287–296.
- 44. Stiller M, Knapp M, Stenzel U, Hofreiter M, Meyer M (2009) Direct multiplex sequencing (DMPS)—a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA. Genome Res 19: 1843–1848.
- 45. Busing F, Meijer E, van der Leeden R (1999) Delete-m jackknife for unequal m. Statistics and Computing 9: 3–8.
- 46. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24: 1596–1599.
- 47. Felsenstein J (2004) PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle.
- 48. Rasmussen D. T, Gutierrez M (2009) A mammalian fauna from the Late Oligocene of northwestern Kenya. Palaeontographica Abteilung a-Palaozoologie-Stratigraphie 288: 1–52.
- 49. Shoshani J, Golenberg E. M, Yang H (1998) Elephantidae phylogeny: morphological versus molecular results. Acta Theriol: Suppl 589–122.
- 50. Shoshani J, Walter R. C, Abraha M, Berhe S, Tassy P, et al. (2006) A proboscidean from the late Oligocene of Eritrea, a “missing link” between early Elephantiformes and Elephantimorpha, and biogeographic implications. Proc Natl Acad Sci U S A 103: 17296–17301.
- 51. Vignaud P, Duringer P, Mackaye H. T, Likius A, Blondel C, et al. (2002) Geology and palaeontology of the Upper Miocene Toros-Menalla hominid locality, Chad. Nature 418: 152–155.
- 52. Leakey M. G, Harris J. M (2003) Lothagam: the dawn of humanity in eastern Africa. New York: Columbia University Press .
- 53. Sukumar R (1989) The Asian Elephant: Ecology and Management. Cambridge: Cambridge University Press.
- 54. Moss C. J (2001) The demography of an African elephant (Loxodonta africana) population in Amboseli, Kenya. Journal of Zoology 255: 145–156.
- 55. Rasmussen H. B, Okello J. B. A, Wittemyer G, Siegismund H. R, Arctander P, et al. (2008) Age- and tactic-related paternity success in male African elephants. Behavioral Ecology 19: 9–15.