Conceived and designed the experiments: EB NJG. Performed the experiments: EB JSB. Analyzed the data: EB JSB. Contributed reagents/materials/analysis tools: NJG. Wrote the paper: EB.
Current address: School of Natural Sciences, University of California Merced, Merced, California, United States of America
Current address: Department of Anatomy and Structural Biology, Centre for Reproduction and Genomics, University of Otago, Dunedin, New Zealand
The authors have declared that no competing interests exist.
Microsatellites are popular genetic markers in molecular ecology, genetic mapping and forensics. Unfortunately, despite recent advances, the isolation of de novo polymorphic microsatellite loci often requires expensive and intensive groundwork. Primers developed for a focal species are commonly tested in a related, non-focal species of interest for the amplification of orthologous polymorphic loci; when successful, this approach significantly reduces cost and time of microsatellite development. However, transferability of polymorphic microsatellite loci decreases rapidly with increasing evolutionary distance, and this approach has shown its limits. Whole genome sequences represent an under-exploited resource to develop cross-species primers for microsatellites. Here we describe a three-step method that combines a novel in silico pipeline that we use to (1) identify conserved microsatellite loci from a multiple genome alignments, (2) design degenerate primer pairs, with (3) a simple PCR protocol used to implement these primers across species. Using this approach we developed a set of primers for the mammalian clade. We found 126,306 human microsatellites conserved in mammalian aligned sequences, and isolated 5,596 loci using criteria based on wide conservation. From a random subset of ∼1000 dinucleotide repeats, we designed degenerate primer pairs for 19 loci, of which five produced polymorphic fragments in up to 18 mammalian species, including the distinctly related marsupials and monotremes, groups that diverged from other mammals 120–160 million years ago. Using our method, many more cross-clade microsatellite loci can be harvested from the currently available genomic data, and this ability is set to improve exponentially as further genomes are sequenced.
Microsatellites, also called simple sequence repeats, consist of short (1–6 bp), tandemly repeated DNA motifs dispersed throughout genomes. Microsatellite sequences mutate through motif insertions and deletions along the repeat array, often at rates several orders of magnitude higher than the average genomic mutation rate
Despite a number of recognized advantages of microsatellites over other genetic markers, such as easy sample preparation and high information content
Seeking to yield large amounts of genetic information with the least initial effort and cost, investigators commonly make attempts at transferring known microsatellite markers between species, typically from previously examined focal species to related non-focal species (e.g.
With no prior focus on reducing the impact of these limitations, the traditional cross-species microsatellite transfer approach has had varying, generally disappointing, levels of success
Here, we present a novel and economic strategy that exploits our recent advances in building comprehensive datasets of microsatellites conserved across the mammalian clade
Our overall strategy is presented in
The University of California, Santa Cruz (UCSC) Genome Brower can be found at
Mammalian species were chosen to include nine sister species pairs (n = 18) representing three of the four superorders of eutherians (Laurasatheria, Euarchontoglires and Afrotheria), as well the too often neglected marsupials and monotremes. We collected DNA, blood or tissue samples from 20 presumably unrelated individuals per species (
Orthologous mammalian microsatellites were identified using the UCSC vertebrate 17-WA
An initial subset of ∼1,000 human dinucleotide microsatellites (length ≥14 bp) was randomly selected from a pool of broadly conserved microsatellites in the mammalian clade, i.e. present in at least in five mammals, or in comparisons including at least human, opossum and either dog or mouse. Mammalian species included in this study shared a common ancestor 160 MYA, and thus the chances of finding conserved
In order to optimize the identification of cross-species microsatellites with flanking sequences conserved across the entire mammalian clade, including monotremes (platypus), we reviewed by eye each microsatellite locus in the 28-way conservation track
Alignments were submitted to PrimaClade
Repeats | Stability of primer secondary structures (ΔG |
|||||||||||
Lexpected | Lprimer |
Tm |
ΔTm | %GC |
2-6× | 1× | 3′ HP | Int HP | 3′ SD | Int SD | 3′ CD | Int CD |
<350 | 18–22 | 58–62 | <1 | 45–60 | <3 | <6 | >−2.00 | >−3.00 | >−5.00 | >−6.00 | >−5.00 | >−6.00 |
Lexpected: expected length of PCR products (bp); Lprimer: primer length (bp); Tm: melting temperature (°C); ΔTm: Tm difference between both primers; %GC: G+C content; 2-6×: number of tandemly repeated non-mononucleotide motifs (2–6 bp); 1×: length of mononucleotide runs; ΔG: Gibbs free energy required to break the secondary structure (kcal/mol); 3′: 3′-end of primers; Int: Internal; HP: hairpin, SD: self-dimer, CD: cross-dimer.
Output from NetPrimer; criteria as recommended in the application's manual.
Exceptionally up to 26 bp.
Output from PrimaClade.
A 30–62% range was tolerated for primers >22 bp.
We followed the M13-tail PCR method of
A total of 126,306 human microsatellites were found conserved in at least one of the non-primate mammalian species, i.e. in mouse, rat, rabbit, dog, cow, elephant, armadillo, tenrec and/or opossum (Information S4). An initial subset of ∼1,000 human dinucleotide microsatellites (length ≥14 bp) was randomly selected from a total pool of 5,596 microsatellites (including 2,756 dinucleotide repeats) that were broadly conserved across aligned genomes. Furthermore, a total of 73 28-WA intervals, each comprising a potential mammal-wide microsatellite locus, were selected for the presence of a polymorphic dinucleotide microsatellite flanked by stretches of ultra-conserved sequences potentially suitable for cross-species primer design (
Degenerate primer pairs were then successfully designed for 19 microsatellite loci. Of those 19 primer pairs tested using a unique, optimized set of PCR conditions, nine pairs yielded a scorable band pattern in all tested mammalian samples (Information S5). There was no significant difference in amplification success between highly and slightly degenerate primer pairs, nor did primer G+C content of sequence affect amplification success (Information S3).
To test our set of mammal-wide microsatellite loci for length polymorphism at the population level, amplicons were produced and genotyped in each 20-sample set. Of nine primer pairs developed for cross-species genotyping, five were successful in providing allele length data at the population level across most species.
C2-1218 |
C2-6868 |
C2-1915 |
C4-1514 |
C6-1112 | C9-1918 | C14-9692 | C15-3531 | C17-4243 |
|
Human | 268–294 (9/18) | 228 (1/20) | 166–178 (5/17) | 281–283 (2/20) | 152–156 (2/19) | 300–302 (2/14) | 234–237 (3/20) | 226–228 (2/17) | 311 (1/20) |
0.47/0.82 | 0/0 | 0.71/0.64 | 0.30/0.26 | 0.11/0.19 | 0.38/0.52 | 0.05/0.15 | 0.29/0.26 | 0/0 | |
Chimpanzee | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Mouse | 291–301 (10/20) | 242–291 (16/17) | 216–238 (10/14) | 313–317 (4/19) | 155–161 (3/18) | 311 (1/20) | 240–242 (2/19) | 297–299 (2/16) | 319–325 (5/19) |
0.30/0.76 | 0.56/0.93 | 0.50/0.91 | 0.22/0.52 | 0.50/0.51 | 0/0 | 0.26/0.56 | 0.13/0.12 | 0.42/0.62 | |
Rat | 274–280 (2/20) | 236 (1/20) | 176 –180 (3/20) | 274 (1/20) | 158–162 (3/19) | n/a | n/a | 237 (1/20) | 326 (1/15) |
0.45/0.36 | 0/0 | 0.35/0.31 | 0/0 | 0.47/0.55 | 0/0 | 0/0 | |||
Dog | 268–278 (9/20) | 256–268 (4/18) | 180–191 (5/19) | 297–299 (2/17) | 3 peaks (1/11) | n/a | 214 (1/20) | n/a | 309 (1/20) |
0.40/0.83 | 0.28/0.52 | 0.47/0.78 | 0.12/0.11 | 0/0 | 0/0 | ||||
Cat | 265–276 (8/20) | n/a | 176–188 (6/19) | n/a | n/a | n/a | n/a | n/a | 312 (1/20) |
0.75/0.82 | 0.63/0.72 | 0/0 | |||||||
Cow | 259–264 (2/18) | 231 (1/20) | 167–169 (2/20) | 281 (1/20) | 146 (1/20) | 300–305 (2/20) | 208 (1/20) | 240–242 (2/20) | 308 (1/20) |
0.06/0.06 | 0/0 | 0.15/0.22 | 0/0 | 0/0 | 0.50/0.51 | 0/0 | 0.05/0.05 | 0/0 | |
Sheep | 270–280 (8/19) | 229–237 (4/14) | 163–173 (4/15) | 292 (1/20) | 146 (1/20) | 307–308 (2/20) | 208–212 (3/18) | n/a | 306 (1/20) |
0.58/0.83 | 0.29/0.37 | 0.56/0.64 | 0/0 | 0/0 | 0/0.10 | 0.39/0.60 | 0/0 | ||
Dolphin | 264–278 (4/19) | n/a | 160–176 (7/19) | 291–295 (2/16) | 148–150 (2/19) | 313–319 (3/16) | 214–215 (2/19) | 226 (1/20) | 303–304 (2/20) |
0.47/0.61 | 0.74/0.81 | 0.13/0.12 | 0.17/0.25 | 0.50/0.59 | 0/0.27 | 0/0 | 0/0.39 | ||
Pilot Whale | 265 (1/20) | 243 (1/20) | 161–174 (6/17) | 292 (1/19) | 148 (1/20) | 313–317 (4/18) | 216 (1/20) | 223 (1/20) | 307 (1/20) |
0/0 | 0/0 | 0.82/0.79 | 0/0 | 0/0 | 0.39/0.70 | 0/0 | 0/0 | 0/0 | |
Hedgehog | 260–272 (5/20) | 225–230 (5/20) | 268–172 (3/20) | 321–325 (2/20) | 148–154 (2/20) | 345 (1/20) | 151–157 (2/20) | 213–227 (7/20) | 303 (1/20) |
0.65/0.70 | 0/0.10 | 0.40/0.56 | 0.45/0.48 | 0.25/0.30 | 0/0 | 0.25/0.30 | 0.65/0.81 | 0/0 | |
Shrew | 309–329 (11/20) | 254–256 (3/20) | 221–223 (2/20) | 281 (1/20) | n/a | n/a | n/a | n/a | 309–313 (4/19) |
0.80/0.88 | 0.50/0.45 | 0/0 | 0/0 | 0.26/0.25 | |||||
Dugong | 269–273 (4/20) | 225 (1/17) | 176 (1/20) | 274 (1/19) | 138 (1/17) | 289 (1/18) | n/a | 222 (1/17) | 294–298 (3/18) |
0.45/0.57 | 0.05/0.05 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0.50/0.41 | ||
Tenrec | n/a | n/a | n/a | 281 (1/20) | n/a | n/a | n/a | n/a | 316–319 (4/15) |
0/0 | 0.33/0.55 | ||||||||
Tammar wallaby | 249–291 (9/16) | n/a | 193–195 (2/16) | 281 (1/16) | 149 (1/16) | n/a | n/a | 191–293 (14/15) | 325–332 (5/10) |
0.79/0.83 | 0.06/0.06 | 0/0 | 0/0 | 0.60/0.94 | 0.20/0.70 | ||||
Quoll | 241–243 (2/9) | 318–342 (6/8) | n/a | 297 (1/20) | 148 (1/20) | n/a | 203 (1/14) | n/a | 299 (1/15) |
0.11/0.11/0.10 | 0.50/0.81/0.72 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | ||||
Platypus | 245–263 (2/15) | 346–382 (7/13) | 214–226 (4/15) | n/a | 145 (1/11) | n/a | 208 (1/18) | n/a | 298 (1/15) |
0/0.13/0.12 | 0.85/0.72/0.64 | 0.13/0.36/0.32 | 0/0/0 | 0/0/0 | 0/0/0 | ||||
Echidna | 248–252 (4/15) | 372–376 (4/13) | n/a | 317 (1/14) | 142 (1/12) | 278 (1/20) | 205–213 (5/14) | 194–196 (2/12) | 298 (1/17) |
0.20/0.57 | 0/0.65 | 0/0 | 0/0 | 0/0 | 0.50/0.76 | 0.09/0.09 | 0/0 |
Allelic Range (number of alleles/number of individuals successfully genotyped) Observed Heterozygosity/Expected Heterozygosity.
indicates sequenced loci.
It is interesting to weigh the extent of polymorphism at each locus against the sequence data that is available from the 28-way alignments. Indeed, intraspecies polymorphism is largely influenced by the length of pure repeat segments within the microsatellite sequence, with long pure microsatellite tracts tending to be more polymorphic than short and/or degenerated microsatellites
Although sequencing is not standard practice in most applications of microsatellite markers, we sought to examine in detail the relationship between DNA sequence and the nature and extent of polymorphism of our most successful cross-species microsatellite loci across the studied species. Sequence-level information is indeed essential to inspect (i) whether allele length variations are attributable to additions/deletions of motifs within microsatellite sequences rather than indels in the flanking sequences, (ii) what is the extent of size homoplasy, if any, among alleles (homoplastic alleles have identical length but different sequence), (iii) the relationship between microsatellite structure and polymorphism
We carried out cross-species direct PCR sequencing of the five most successfully genotyped microsatellite loci, namely C2-1218, C2-1915, C4-1514, C9-1918 and C17-4243 (
Ten out of the 31 new allele variants showed a difference between total length change and microsatellite length change. These differences are most likely the result of short indels occurring in flanking regions. However, we cannot be certain for all cases, due to the absence of flanking sequence information, and because genotyping errors cannot be completely ruled out. In the other 21 comparisons (68%), changes in total locus length were consistent with repeat addition or removal in the microsatellite sequence. Although six cases of size homoplasy were observed (identical size, different sequence), only two originated from mutations in both microsatellite and flanking sequences, the other four cases originating from a point mutation within the microsatellite sequence. Finally, in all cases, addition/removal of one or more motifs occurred in the longest pure tract(s) of dinucleotide repeats.
Microsatellites are currently one of the most popular types of genetic markers for molecular ecology, forensics and genome mapping studies. Their evolutionary dynamics have been extensively studied
Here we described a novel combination of
Focusing our analysis on the entire breadth of the Mammalia (eutherians, marsupials and monotremes) ensured a large evolutionary scope as well as a solid genomic framework where scores of conserved microsatellites have been identified
There was no particular relationship between PCR success and either G+C-content of PCR priming sites, genomic location, and number of degeneracy in primer sequences (Information S3). Of 19 designed primer pairs, nine were successfully optimized for mammal-wide amplification, and five were suitable for genotyping and sequencing. A number of methodological choices were made to decrease costs, but they may have reduced success rates in genotyping and sequencing, e.g. Chelex extraction method (impure DNA extract), M13-genotyping (primer dimers, inconsistent fluorescent signal), use of degenerate primers (low amplification), and direct PCR sequencing (low quality reads). We would expect a significant increase in success rate using clean extraction methods (extraction kit, phenol-chloroform protocol), standard fluorescent genotyping, non-degenerate primers and clone sequencing. In addition, we had little or no control on sampling and DNA quality for most of our samples, which may have had detrimental consequences on the overall quality of our results. For example, low polymorphism in rats and pilot whales could be explained by our samples originating from inbred populations
Overall, our cross-species primers still yielded good genotyping results for five of the nine fully optimized loci. Intraspecies polymorphism was strongly associated with length and purity of repeat tracts, which emphasized the importance of examining the sequence structure of microsatellites to select polymorphic genetic markers. Sequence information demonstrated that most changes (68%) in total fragment length at the five loci were attributable to mutations in the microsatellite sequence rather than in the flanking sequences, suggesting that cross-species primers designed for these loci are invaluable candidates for being employed as universal genetic markers across the Mammalia, as it has already been demonstrated for the under-studied short-beaked echidna
Our findings establish a new paradigm in that they demonstrate that with the emergence of large numbers of genome sequences for a given taxonomic group, universal sets of microsatellite markers can be generated for that group, using a simple protocol. Provided that such sets are fully characterized and tested for confounding influences in the the different species of interest (e.g. linkage and deviations from the Hardy-Weinberg equilibrium), and standardized for use in different laboratories, this creates the genuine possibility of developing large panels of microsatellites with cross-species transferability and known genomic context
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
We warmly thank the many individuals and institutes that have generously contributed to our collection of mammalian samples: J. Hickford, L. Moller, M. Oremus and S. Baker, I. Vargas-Jentzsch, M. Hale, G. Yannic and J. Hausser, D. Tautz, B. Robertson, A. Amanzadeh and F. Shokri, A. Stone, S. Goodman, A. MacMahon and D. Blair, J. Graves, M. Cardoso, C. Whittington, S. Nicol, and P. Rismiler. Names of affiliated institutes can be found in