Conserved Microsatellites in Ants Enable Population Genetic and Colony Pedigree Studies across a Wide Range of Species

Broadly applicable polymorphic genetic markers are essential tools for population genetics, and different types of markers have been developed for this purpose. Microsatellites have been employed as particularly polymorphic markers for over 20 years. However, PCR primers for microsatellite loci are often not useful outside the species for which they were designed. This implies that a new set of loci has to be identified and primers developed for every new study species. To overcome this constraint, we identified 45 conserved microsatellite loci based on the eight currently available ant genomes and designed primers for PCR amplification. Among these loci, we chose 24 for in-depth study in six species covering six different ant subfamilies. On average, 11.16 of these 24 loci were polymorphic and in Hardy-Weinberg equilibrium in any given species. The average number of alleles for these polymorphic loci within single populations of the different species was 4.59. This set of genetic markers will thus be useful for population genetic and colony pedigree studies across a wide range of ant species, supplementing the markers available for previously studied species and greatly facilitating the study of the many ant species lacking genetic markers. Our study shows that it is possible to develop microsatellite loci that are both conserved over a broad range of taxa, yet polymorphic within species. This should encourage researchers to develop similar tools for other large taxonomic groups.


Introduction
Microsatellites, also called short tandem repeats (STRs) or simple sequence repeats (SSRs), are sequential repeats of 1 to 6 base pair motifs that have been used as genetic markers for more than 20 years [1,2,3]. Often found in noncoding regions, they are common in the genomes of eukaryotes [4][5][6]. An important feature of these sequences is their high degree of length polymorphism within populations of single species, which has been attributed to DNA polymerase slippage during replication [7,8]. This can result in a large number of alleles per locus that differ from one another in the number of repeats, making them distinguishable by size alone. This high degree of polymorphism and the ease of genotyping make them particularly suitable for studies in population genetics and pedigree analyses [9,10]. For example, microsatellites have been used to measure population differentiation and hybridization [11,12], to investigate ploidy levels [13,14], and to reconstruct parentage and pedigrees in wild and domestic populations [15,16]. Microsatellites are comparatively cheap to genotype and can be used with low concentrations of DNA. Furthermore, they typically have more alleles per locus than single nucleotide polymorphisms (SNPs) and thus provide more information per locus [17]. Although they often have a high degree of polymorphism within species, some microsatellite loci can be conserved across species that diverged 100 million years ago or more [18][19][20][21][22][23][24].
More recently, next generation sequencing (NGS) techniques have risen in popularity, mainly because of the large number of marker loci they can generate at relatively low per locus cost. For example, restriction site-associated DNA (RAD) tags can generate thousands of markers and have proven instrumental for measuring gene flow between populations [25], as well as for reconstructing shallow phylogenies [26]. However, the data generated from these techniques can be complex and difficult to analyze. There are techniques to reduce the complexity of DNA libraries such as double digest RADseq (ddRAD) [27], 2b-RAD [28], or genotyping by sequencing (GBS) [29], but these still require expensive NGS platforms. On the other hand, for many studies a smaller number of markers is sufficient, and markers such as microsatellites can be more attractive.
Despite their utility, a significant impediment to the use of microsatellites is the cost and effort associated with identifying a set of loci and developing PCR primers. Although the same loci can sometimes be useful for studying closely related species, loci that are polymorphic in one species are often not informative in another, and primers quickly lose affinity as species become more divergent. This usually requires new microsatellite loci to be characterized for each studied species. Depending on the research question, studies typically require a set of five to ten or more independent microsatellite loci. Paying a commercial service to develop these markers can be costly, and developing markers independently can be labor intensive and time consuming.
Nevertheless, the utility of microsatellites in determining pedigree structures, relatedness and mating systems makes them particularly useful for social insect research because they can be used to address important questions related to inclusive fitness theory, including social organization (e.g. [30]), worker caste determination (e.g. [31]), and the evolution of supercolonies (e.g [32]). Of the social insects, ants are a particularly speciose and ecologically diverse group being intensively studied. Current estimates place the ant family Formicidae at 115 to 158 million years of age [33][34][35], and close to 13,000 species have been described, according to the Hymenoptera Name Server (v. 1.5, accessed 14 April 2014). Eight ant genomes are currently available representing most major ant clades, allowing highly conserved regions to be identified over most of the family. To help overcome the constraints of narrowly applicable primers and to make microsatellites broadly available as population genetic markers, we aimed to develop a set of microsatellite markers that would be conserved across a wide range of species, yet polymorphic within species.

Results
To design a set of broadly applicable microsatellite primers we searched the eight currently available ant genomes for conserved microsatellite motifs with conserved flanking regions. The eight available ant genomes are from the red harvester ant Pogonomyrmex barbatus (subfamily Myrmicinae) [36], Jerdon's jumping ant Harpegnathos saltator (subfamily Ponerinae), the Florida carpenter ant Camponotus floridanus (subfamily Formicinae) [37], the leaf-cutting ants Atta cephalotes (subfamily Myrmicinae) [38] and Acromyrmex echinatior (subfamily Myrmicinae) [39], the Argentine ant Linepithema humile (subfamily Dolichoderinae) [40], the red imported fire ant Solenopsis invicta (subfamily Myrmicinae) [41], and the clonal raider ant Cerapachys biroi Figure 1. Phylogeny of the ants, showing the phylogenetic distribution of the species used in this study. The size of each triangle is proportional to the number of species in each group, and the approximate number of species is given in parentheses next to the group name. Boxes next to species names indicate whether that species' genome was used to design (green) or test (purple) the PCR primers. Figure Table 2 for details on the remaining 24 loci that were also tested using labeled PCR primers. ''Yes'' indicates clear amplification of a single product. ''No'' indicates no amplification of any product. ''MP'' indicates that there were multiple products from which the desired product could not be determined. doi:10.1371/journal.pone.0107334.t001 (subfamily Dorylinae) [42]. The available genomes represent five of the 21 recognized extant ant subfamilies, allowing us to select primer sequences that are conserved in a wide range of species across the ants ( Figure 1). We identified 176 potential microsatellite loci with conserved flanking regions across all eight genomes, and among those selected 45 that had a repeat motif in most or all of the available genomes (Table S1 in File S1). To demonstrate their usefulness in species other than those with available genomes, we tested these primers for amplification in six species from six different subfamilies, only one of which was also used for primer design (Solenopsis invicta, subfamily Myrmicinae) ( Figure 1). The other five species in which the markers were tested were the bullet ant Paraponera clavata (subfamily Paraponerinae), the army ants Simopelta pentadentata (subfamily Ponerinae) and Dorylus molestus (subfamily Dorylinae), Lasius nearcticus (subfamily Formicinae), and Ectatomma ruidum (subfamily Ectatomminae). The success of PCR amplification varied by locus and species (Tables 1  & 2). From those 45 loci, we selected 24 that amplified well in all or most of the six species tested and also had at least ten consecutive repeats of their motif in the genomes of more than one of the species with available genome sequences (Table S1 in File S1). We genotyped those 24 loci across all six species using fluorescently labeled primers (Applied Biosystems). PCR amplification was successful for all 24 loci in L. nearcticus and D. molestus, for 23 loci in S. invicta, for 22 loci in P. clavata and E. ruidum, and for 21 loci in S. pentadentata (Table 2, Figure 2). To determine which of the microsatellite loci were polymorphic in any given species, we genotyped ten individuals from ten different colonies from the same population of each species for each locus. On average, 12.83 (66.15 SD) of the 24 loci were polymorphic in a given species, and 11.16 (65.27 SD) were polymorphic and in Hardy-Weinberg equilibrium (Table 2, Figure 2). Across those polymorphic loci in Hardy-Weinberg equilibrium, the average number of alleles per locus per species was 4.59 (62.41 SD). The average observed heterozygosity was 0.534 (60.22 SD), and the average expected heterozygosity was 0.61 (60.22 SD). Most of the loci were monomorphic for multiple species. However, in all cases the monomorphic allele at a given locus was different for each species. We found no statistical linkage disequilibrium (at p,0.00003 after Bonferroni correction) between any pair of loci in any species, but this is likely due to small sample sizes and reduced power due to the large number of tests performed. In fact, in all eight genomes there are scaffolds containing multiple loci, i.e. these loci occur on the same chromosome and are therefore physically linked (Table  S2 in File S1).

Discussion
To reduce the time and cost associated with developing microsatellite primers for a large number of different species, we designed a set of 45 primer pairs for potential use in a broad range of ant species spanning many millions of years of evolution. We tested 24 of these primer pairs in detail across six distantly related ant species from six different subfamilies. The number of useful polymorphic loci ranged from 5 to 20 for the six species we tested, although those loci were not always the same across species. Although we found no statistical linkage between any loci, some loci were located on the same scaffold in the genome assemblies of the reference species, and the location of the loci in the reference genomes should be considered when selecting primers from this set (Tables S1 & S2 in File S1). In assessing the utility of these markers in other species, it may be initially beneficial to test the entire set using inexpensive unlabeled primers. Then fluorescently labeled primers can be used for genotyping only those loci that amplify and yield clean PCR products. To further reduce costs, the primers described here could be used as unlabeled locus-specific primers in combination with universal labeled-tail primers [43].
Microsatellites have been an important tool for studies in population genetics for more than 20 years [1][2][3]. They are excellent markers for many types of studies including pedigree analyses and mating system studies, but their applicability has previously been limited by the narrow range of taxa in which each locus can be used. Researchers usually develop sets of primers specifically for their study species or a group of closely related species, and ants are no exception in this respect (e.g. [44][45][46][47][48][49][50][51][52][53][54][55][56][57][58]). For example, we found 32 publications of microsatellite primer notes for ants in the journal Molecular Ecology Resources, a leading outlet for the publication of population genetic markers. These primer notes represented 31 species and 28 genera. Looking only at those studies that described more than ten polymorphic loci per species, the number of alleles per locus ranged from 2 to 21        (Table 3). Species-specific primers often had more alleles per locus than we report here. The average number of alleles per locus across all species and loci from Table 3 is 7.58(64.57 SD) while the average for the loci described here is 4.59 (62.41 SD). One possible explanation is that this reflects a tradeoff between sequence variability within species and sequence conservation across species. On the other hand, this trend is probably at least partly attributable to our small sample size of specimens per species. The number of alleles per locus will likely increase as more samples are genotyped, especially if these come from different populations. Many microsatellite primers are effective at amplification in congenerics, and some microsatellite primers have been successfully used across genera within the same ant subfamily e.g. [59][60][61]. However, to our knowledge, none have successfully amplified polymorphic microsatellite loci across multiple subfamilies. Here we characterize conserved microsatellite markers that are broadly useful across the ants and that will open opportunities for research on the many ant species lacking established genetic markers. These markers, like other microsatellites, will be especially useful for addressing questions in social insect research related to parentage, mating system and colony pedigree structure, i.e. questions for which it is preferable to maximize the number of samples genotyped while fewer markers are generally sufficient. The markers will also be useful in standard population genetic analyses, e.g. of population structure and gene flow. For questions that require a large number of markers such as genomic mapping, NGS data will generally be preferable. However, the loci presented here can readily be used to supplement NGS data.
There is demand for broadly applicable microsatellite primers outside the ants as well. Attempts to use microsatellite primers far outside of the species for which they were designed have had varying success. For example, primers designed for use in cattle have proven useful in other closely related mammals [19,20,62], and microsatellite primers designed for several different legumes have amplified polymorphic loci in the legume genus Glycyrrhiza [63]. Some primers designed for the paper wasp genus Polistes have also successfully amplified polymorphic loci in other Polistine wasps and even in the related subfamilies Vespinae and Stenogastrinae [18]. In marine turtles, primers have successfully amplified polymorphic microsatellites in species that diverged 300 MYA [23]. Additionally, a set of primers similar to those described here has been designed for birds using the genomes of the chicken, Gallus gallus, and the zebra finch, Taenipygia guttata [64,65]. These conserved microsatellite loci also span a long evolutionary distance, as these species have diverged approximately 100 to 120 MYA [66,67]. Our study in ants and those in birds [64,65] present sets of primers designed explicitly for use in a broad range of species spanning a long evolutionary distance rather than testing species-specific primers in other distantly related species. Together, they set a precedent for identifying similar sets of markers in other diverse groups of comparable ages. This suggests that, with the availability of genomic information across an ever-increasing range of taxa, conserved microsatellites will become available as powerful population genetic tools for a wide variety of organisms.

Specimen collection
All specimens of Ectatomma ruidum and Paraponera clavata were collected at the Organization for Tropical Studies field station in La Selva, Costa Rica. Simopelta pentadentata specimens were collected in Monteverde, Costa Rica. Dorylus molestus specimens were collected in Kakamega Forest, Kenya. Lasius Table 3. Overview of number of alleles and expected and observed heterozygosity in eight studies of species-specific microsatellite primers in ants.

Bioinformatics
Seven available ant genomes were downloaded from Ant Genomes Portal (hymenopteragenome.org/ant_genome), and our lab has recently published the C. biroi genome [42]. The genome versions for each species were A. cephalotes v1.0, A. echinatior v2.0, C. floridanus v3.3, C. biroi v2.0, H. saltator v3.3, L. humile v1.0, P. barbatus v3.0, S. invicta v1.0. Microsatellites in the C. biroi genome were located using Tandem Repeats Finder ('TRF'; v. 4.04) [68], which utilizes Smith-Waterman style local alignment. Tandem repeats are reported only if they exceed a minimum alignment score, specified as 50 (Minscore = 50). Alignment mismatches were assigned a weight of five (Mismatch = 5). Additionally, the size of the repeat pattern was limited to five bases (Maxperiod = 5). The microsatellite indices returned were used to generate a masked BLAST query for each microsatellite, extended to include 200-bp flanking regions. The query sequence was used to search all eight sequenced ant genomes, including C. biroi, using BLAST (v. 2.2.26+) [69]. The results were filtered to remove matches with less than 60% identity. Microsatellite flanking regions that generated unique BLAST hits in all eight genomes were aligned using MUSCLE [70]. To confirm that these conserved flanking regions indeed contained microsatellite sequences, TRF was used to search for microsatellites in all database genomes at the indices returned by BLAST for each hit (settings as stated above). Primer3 software (v. 2.3.4; http:/primer3. sourceforge.net/releases.php) [71] generated primers from the consensus sequence in each flanking region. A maximum of four unknown bases were allowed in any primer set (PRIMER_ MAX_NS_ACCEPTED = 4). All unspecified parameters used the default or recommended settings. Custom Python scripts were used to parse TRF and Primer3 outputs, prepare files for BLAST and Primer3, and filter the BLAST results. These scripts are available upon request from the corresponding author. Initially, 176 loci were identified across all genomes with the described bioinformatics pipeline, from which we chose 45 loci for further study. These 45 loci were chosen subjectively based on the number of perfect repeats in different species and the presence of a microsatellite motif in as many ant genomes as possible.
DNA extraction, PCR amplification and genotyping DNA was extracted by first homogenizing the tissue in a Qiagen TissueLyser II and then heating the sample at 96uC for 15 minutes in 200 ml of 10% Chelex in TE solution. The samples were then centrifuged at 9100 rpm for three minutes, and the supernatant containing the DNA was removed and used as the template for PCR amplification.
The PCR cocktail (10 ml total volume) for all reactions contained 1 ml PCR Gold Buffer (10x), 0.5 ml MgCl 2 (25 mM), 0.5 ml dNTPs (10 mM total, 2.5 mM each), 0.1 ml of each forward and reverse primer (10 mM), 0.1 ml AmpliTaq Gold (5 U/ml), 1 ml DNA template and 6.7 ml H 2 O. PCR reactions were run on an Eppendorf Mastercycler Pro S under the following conditions: 10 min at 95uC followed by 40 cycles of 15 s at 94uC, 30 s at 55uC and 30 s at 72uC, and a final extension of 10 min at 72uC. PCR products were sent to a commercial facility (Genewiz, Inc.) for genotyping. Analysis of chromatograms was performed using PeakScanner (Applied Biosystems). Calculations of observed and expected heterozygosity, as well as tests for linkage disequilibrium and deviations from Hardy-Weinberg equilibrium were performed using F-STAT (v2.9.3.2) [72].

Supporting Information
File S1 Contains Tables S1 and S2 described below. Table S1. Details of microsatellite loci in eight ant genomes. Numbers in parentheses behind the size of the targeted fragment indicate that there are a number of unknown ("N") bases inserted into the available genome sequence. In some cases, these can be larger stretches of "N" bases in the published genome assembly. This number is included in the total size of the targeted fragment. In most cases, this implies that the given size of the targeted fragment is probably imprecise. The column "Size of targeted fragment in base pairs" thus gives "Total base pairs (number of N bases among the total base pairs)". Some loci have multiple motifs listed. All motifs are in the same region and are included in the size of the targeted fragment. Table S2. Physical linkage of microsatellite loci. X indicates where two loci are on the same scaffold in that species. Order of species left to right in every box is P. barbatus, H. saltator, At. cephalotes, Ce. biroi, L. humile, Ca. floridanus, S. invicta, Ac. echinatior. (XLSX)