Characterization of Prdm9 in Equids and Sterility in Mules

Prdm9 (Meisetz) is the first speciation gene discovered in vertebrates conferring reproductive isolation. This locus encodes a meiosis-specific histone H3 methyltransferase that specifies meiotic recombination hotspots during gametogenesis. Allelic differences in Prdm9, characterized for a variable number of zinc finger (ZF) domains, have been associated with hybrid sterility in male house mice via spermatogenic failure at the pachytene stage. The mule, a classic example of hybrid sterility in mammals also exhibits a similar spermatogenesis breakdown, making Prdm9 an interesting candidate to evaluate in equine hybrids. In this study, we characterized the Prdm9 gene in all species of equids by analyzing sequence variation of the ZF domains and estimating positive selection. We also evaluated the role of Prdm9 in hybrid sterility by assessing allelic differences of ZF domains in equine hybrids. We found remarkable variation in the sequence and number of ZF domains among equid species, ranging from five domains in the Tibetan kiang and Asiatic wild ass, to 14 in the Grevy’s zebra. Positive selection was detected in all species at amino acid sites known to be associated with DNA-binding specificity of ZF domains in mice and humans. Equine hybrids, in particular a quartet pedigree composed of a fertile mule showed a mosaic of sequences and number of ZF domains suggesting that Prdm9 variation does not seem by itself to contribute to equine hybrid sterility.


Introduction
A fascinating unresolved question in evolutionary biology focuses on mechanisms of formation of new species and the processes that preserve species as separate entities. Species maintain their integrity through a set of premating and postmating reproductive barriers. Among them, hybrid sterility is one of the earliest postzygotic isolating barriers in species crosses, generally occurring in the heterogametic sex (XY or ZW), a pattern referred to as Haldane's rule [1]. Postzygotic reproductive barriers are thought to occur through the acquisition of genetic incompatibilities in divergent populations, as proposed by the Dobzhansky-Muller model [2], in which negative epistasis between alleles/loci produces sterility or inviability of hybrids [3].
Few genes have been associated with hybrid sterility in model species [4,5]. In Drosophila, some examples include the Odysseus-site homeobox [6], JYAlpha [7] and Overdrive [8] genes. More recently, Prdm9 (Meisetz) has been identified as the first speciation gene conferring reproductive isolation in vertebrates [9]. This gene encodes a meiosis-specific histone H3 methyltransferase that is only expressed in germ cells entering the meiotic prophase [10,11]. Prdm9 contains KRAB and SET domains that are highly conserved among metazoans [12], followed by a set of Cys 2 His 2 repeated zinc finger (ZF) domains in tandem near the carboxyterminal region [13]. The ZF domains are the DNA-binding regions that confer specificity to the gene in a modular manner. The structure of ZF domains is well established, consisting of 21 residues of conserved amino acids that coordinate and position a highly variable nucleotide contact region, and seven conserved amino acid linkers that join adjacent zinc fingers [14]. Prdm9 ZF domains exhibit rapid evolution and positive selection in a variety of organisms including humans, mice, cattle and salmon [12,15]. In other species however, this gene is absent (birds, lizards, snakes) or has acquired disrupting mutations, such as a pseudogene in dogs [16].
Prdm9 in vertebrates is responsible for activating and specifying genome-wide meiotic recombination hotspots [17][18][19][20][21]. Prdm9-null mice display arrest of gametogenesis in meiotic prophase I and impaired double-strand break repair [10]. Allelic differences in the number of ZF domains in Prdm9 have been associated to hybrid sterility in house mouse crosses due to failure of spermatogenesis. For instance, an allele of Prdm9 encoding 13 ZF domains (PRDM9 B6 ) in Mus musculus domesticus (Mmd) causes sterility in F1 hybrid males when combined with another Mmd allele containing 14 ZF domains (PRDM9 C3H ) [9]. Furthermore, in humans, rare dominant non-synonymous mutations in Prdm9 are associated with azoospermia also suggesting the presence of allelic incompatibilities [22].
Mules, hybrid offspring of a female horse and a male donkey, also exhibit a similar spermatogenesis breakdown at the pachytene stage of meiosis [23,24], making Prdm9 an interesting candidate gene for evaluation in interspecific equine crosses. Extant equids belong to a recently-evolved group of mammals diverging from a common ancestor about four million years ago [25,26]. Due to a relatively recent divergence time, many viable but typically infertile equid hybrid combinations can be produced, not only via human-mediated reproduction but naturally [27,28]. Differences in the number and structure of the haploid sets of chromosomes have been argued to form the basis for the inability of chromosome pairs to synapse in meiosis, producing sterility in equine hybrids [29]. However, reported instances of female mules and other equine hybrids with odd chromosome number given birth to perfectly viable offspring [30][31][32][33], suggest that mechanisms other than chromosomal differences contribute to hybrid sterility in equids.
In this study, we investigate the role of the Prdm9 gene in equine hybrid sterility by assessing allelic differences of ZF domains in hybrids, including a fertile mule pedigree. For that purpose, we characterized Prdm9 in all species of equids by determining patterns of sequence variation of ZF domains, and estimating positive selection. While this approach identified radical alterations in the number of ZF domains and rapid evolution among species, Prdm9 allelic variation does not seem by itself to produce sterility in equids.

Ethics Statement
All samples were collected from postmortem animals or opportunistically during medical examination, according to the IACUC number 12-023. This study was approved by the Zoological Society of San Diego, Institutional Animal Care and Use Committee (N.H.I Assurance A3675-01).

Equid Prdm9 Sequences
Well annotated Prdm9 sequences from human and mouse were obtained from NCBI, Ensembl, and UCSC browsers. These sequences were used to interrogate the domestic horse genome (Sep.2007 Broad/equCab2) and predict the Prdm9 gene using TBLASTn. Best hits showed sequence identities higher than 75% for all comparisons (horse-mouse, horse-human). The horse DNA sequence was pulled out and used for designing primers in conserved regions of the alignment of mouse, human and horse sequences. Specifically, primers were designed to amplify the final exon of Prdm9 in equids containing the ZF domains: Prdm9_I5F 59CAGGCAGCCTTGTCAACATCTACCCT, Prdm9_I6Rb 59 CGTTGGAGCTGGAGTATGGAGT, and Prdm9_IF 59 GAGGCTTCAATGACAGGGCAAGTCTTAT.
Polymerase chain reaction (PCR) amplifications were performed in a 20 ml volume using Eppendorf Mastercycler Gradient thermal cyclers. Each reaction included 30 ng of template DNA, 10 ml of 1X Taq buffer with 1.5 mM MgCl 2 (Applied Biosystems), 0.3 ml of 10 mM deoxynucleoside triphosphates, 0.6 ml lM of each primer, and 0.15 units Taq DNA polymerase (Applied Biosys-tems). The PCR cycling conditions were 95uC for 6 min, followed by 34 cycles of denaturation at 94uC for 1 min, annealing at 59-60uC for 1 min, and 72uC extension for 1 min, with a final extension at 72uC for 10 min.
All PCR products were gel purified using Qiaquick gel purification kit (Qiagen), and cloned using the TOPO TA cloning kit (Invitrogen) following the manufacturer's instructions. Cloning was implemented due to detection of two PCR products of similar size on 2% agarose gels in some equid species and hybrids. M13 forward and reverse primers plus Prdm9 primers were used to sequence a minimum of five positive clones per species and hybrid. DNA sequences were edited and aligned with Sequencher 3.1.1 (Gene Codes, Ann Arbor, MI) and Geneious v1.2.1 [34], and then adjusted by eye, conserving the ZF domain reading frame. All equid sequences were translated and tested for encoding multiple ZF domains in-frame. Equid Prdm9 sequences were submitted to Genbank (accession numbers KC209783-KC209813).

Equid Prdm9 cDNA
Total RNA was extracted from Asiatic wild ass (onager) testis using a trizol reagent. RT-PCR was set up to select for poly (A) RNA (mRNA) with oligo (dT), according to the superscript III first-strand synthesis system (Invitrogen). Amplification of target cDNA was carried out using specific primers that were also employed for sequencing: Prdm9_E6F_cDNA 59 AGCTAGA-GATCCATCCATGTC, and Prdm9_I6R_cDNA 59 GTCCTCTTGGGGCTGAGACGTGAT. Onager cDNA was sequenced to confirm the expression of Prdm9 in equid testes, and validate Prdm9 sequences obtained from genomic DNA. Sequences were edited and aligned as previously described.

Prdm9 Ortholog Identification and Character Mapping
Orthology of equid Prdm9 sequences was initially verified using TBLASTn. In all cases, best hits (e ,10 26 ) always corresponded to Prdm9 sequences from other species (e.g., Mus, Homo, Peromyscus, Apodemus) with average sequence identity higher than 75%. Known Prdm9 paralogs such as Prdm7 never showed up as best hits. The highest-scoring Prdm9 sequences from Genbank were taken for reciprocal best BLAST against equid sequences for validating putative orthologs. To verify sequence identity, equid Prdm9 sequences were blasted against the domestic horse genome using BLAT (UCSC browser).
Presence/absence of ZF domains was mapped on a Bayesian phylogenetic tree of the family Equidae [35], using MacClade 4.08 [36] to verify patterns of ZF domain evolution among equid lineages. All characters were assumed to be unordered and equally weighted, and calculations were made considering only unambiguous changes.

dN/dS Analyses
Due to the high degree of concerted evolution that may occur among ZF domains, pairwise analyses of the non-synonymous to synonymous rate ratio (dN/dS) of Prdm9 sequences from different species may be mislead [15]. Considering that, positive selection analyses were performed only for ZF domains, comprising 28 amino acids with two cysteine and two histidine residues. ZF domains were compared within and among equid species. Codons were aligned by a ClustalW [37] protein alignment (default parameters). Phylogenetic trees from each alignment were constructed using maximum parsimony in PAUP* version 4.0b10 [38] and PhyML 3.0 [39]. Topologies were accepted if no major discordance was observed between methods, and trees were supported by 1000 bootstrap replicates (.80%) [40]. The codeml program from PAML [41] was used to identify significant differences in likelihood values between nearly neutral (model 7) and positive selection models with unconstrained omega (model 8) and omega constrained to 1.0 (8a). P-value thresholds were corrected for multiple tests (Bonferroni correction). Amino acid sites under positive selection and P-values were inferred using the Bayes-Empirical-Bayes dN/dS approach in the unconstrained model 8.

Equid Prdm9 Sequences and Positive Selection
We sequenced the final exon of Prdm9, which contains the ZF domains, in 14 equid individuals. These sequences ranged from 1074 bp to 1830 bp depending on the species. When blasted against the domestic horse genome, equid sequences matched an unannotated region in chromosome 3 (position 6378542-36379363) with 96% average identity. In particular, hits obtained from NCBI, UCSC, and Ensembl browsers were characterized by having early stop codons after the third ZF domain in comparison with the domestic horse sequence we generated (Figure 1), that showed a larger number of ZF domains. The UCSC and NCBI sequence hits were identical to a partial sequence from Ensembl annotated as Prdm7, a paralog gene of Prdm9 found in other vertebrate species. This result suggests that equids may have at least two Prdm genes, one shorter copy annotated as Prdm7 and a second copy not yet annotated corresponding to Prdm9, that differ in sequence and number of ZF domains.
In equid Prdm9 sequences, we identified 13 different ZF domains characterized by amino acid variation at specific positions 25, 22, 21, 3 and 6 ( Figure 2). All species showed a Prdm9 final exon comprising a mixture of ZF domains, some appearing on species-specific lineages. For instance, ZF domains A, D, I and K were only identified in caballine horses, the lineage consisting of the domestic horse and Przewalski's horse (Figure 3). Equids were highly variable in the number of ZF domains, ranging from five in the Tibetan kiang (E. kiang) to 14 ZF domains in the Grevy's zebra ( Figure 3). Intraspecific variation in the number of ZF domains also was observed in species such as the Przewalski's horse, varying from 8 to 11, onager from 7 to 8, and mountain zebra from 6 to 7 ZF domains. Heterogeneity between chromosome pairs was found in the African wild ass, domestic ass, onager, and mountain zebra differing by one or two ZF domains, or up to seven in the Burchell's zebra. Moreover, repeated ZF domains were recognized in all individuals examined suggesting that Prdm9 may have undergone concerted evolution in equids; that is, some ZF domains have identical sequences, with intraspecific variation being less than that between species. In particular, Grevy's zebra showed the highest number of identical repeated domains, with five copies of ZF domain H and six copies of C in both chromosomes.
Positive selection was evaluated in ZF domains of each species and species combined. The -2 d lnL values between M8 and M8a models ranged from 7.35 (P = 0.0258) in the Asiatic wild ass to 15.05 (P = 0.0005) in the mountain zebra, suggesting the signature of positive selection in all species independently (P,0.05; Table  S2). The Bayes-Empirical-Bayes dN/dS approach revealed positively selected sites corresponding with amino acid positions 25, 22, 21, 3 and 6 ( Table 1). The analysis of all species combined also detected positive selection in ZF domains after correcting for multiple comparisons (P,0.0063). A smaller number of amino acid positions showed signature of positive selection, corresponding to sites 21 and 6, with dN/dS values surpassing the neutral rate, 8-fold or more ( Figure 4).

Prdm9 Sequence Variation in Equine Hybrids
We also sequenced the last Prdm9 exon of seven equine hybrids. Similar ZF domains were identified in hybrids relative to equid species. Some sequences were not detected, including ZF domains E and J. Four domains were noted only in the Przewalski's horse 6 domestic horse hybrid and the fertile mule ( Figure 2). All hybrids showed chimeric composition and variable number of ZF domains between chromosome pairs. Prdm9 differed between hybrid chromosomes by a single domain in Asiatic wild ass (kulan) 6 Tibetan kiang (5 and 6 ZF) and Asiatic wild ass (kulan) 6 African wild ass (6 and 7), to three domains (8 and 11) in Grevy's zebra 6 domestic horse and Przewalski's horse 6 domestic horse hybrids ( Figure 5A).
The reproductive status of the equine hybrids examined is unknown, except for one fertile female mule whose pedigree was available. We examined a quartet composed of the fertile mule (dam, 2n = 63), a donkey (sire, 2n = 62), and two F1 offspring (males, 2n = 63). We used this pedigree information for assessing the role of Prdm9 allele differences in hybrid sterility of equids. Both parents were found to have the same number of ZF domains with one chromosome containing 10 and the other eight domains ( Figure 5B). Sequences between chromosome pairs were similar in the donkey but chimeric in the dam mule, showing characteristics of domestic ass and horse. Parental chromosomes containing 8 ZF domains segregated to the F1 mules resulting in two chromosomes with equal number of ZF domains. F1 mules exhibited a chimeric composition of ZF domains as observed in the dam mule, and sequence variation at amino acid sites 22, 21, 3 and 6. These results suggest that neither the odd chromosome number (2n = 63) nor allelic differences in the number of ZF domains of Prdm9 produce sterility in the female mule.

Genetic Variation and Positive Selection
In this study, we generated partial sequences validated as Prdm9 orthologs for 14 individuals of all equid species, and seven equine hybrids. Equid sequences diverged from the horse genome best hits in a manner suggesting that copies of Prdm genes have not yet been annotated in the horse genome. Duplication of Prdm genes is not unusual and has been described in other vertebrates, with at least 17 Prdm family members known in humans [42]. In primates, Prdm7 is a known paralog of Prdm9 that has undergone major structural rearrangements decreasing the number of encoded zinc fingers and modifying gene splicing [43]. Missing annotated Prdm genes in the horse genome may result from limitations associated with current genome assembling methods in identifying duplicated genomic regions or satellite DNA [44].
Prdm9 ZF domains in equids showed dramatic numerical and amino acid composition variation. This high variation extends findings in other organisms demonstrating the rapid evolution of these domains [14,15]. Prdm9 ZF domains have been suggested to evolve rapidly due to the instability derived from the minisatellite structure of the ZF array [16]. Additionally, Oliver et al. [15] have speculated that Prdm9 may function by binding to repetitive DNA sequences found at pericentric and centromeric regions, which are composed of rapidly evolving repetitive motifs [45]. In vitro, PRDM9 has shown to bind DNA at recombination hotspots via its ZF domains, and genetic manipulation of ZF domains changes the localization of hotspots [11,18].
Multiple studies have suggested that Prdm9 ZF domains have undergone strong positive selection, although this pattern may not encompass all metazoans [14,15]. In rodents and primates, positive selection at specific DNA-binding positions (21,3,6) has led to divergent evolution [15]. In equids, positive selection was restricted to amino acid positions 21 and 6 which correspond to sites responsible for determining DNA-binding specificity of Prdm9. Structural studies have shown that amino acids within the ZF a-helix at positions 21, 3 and 6 interact with bases 3, 2 and 1 respectively in the primary DNA strand [46]. Selection acting on residues 21 and 6 of the equid ZF domains may be altering DNAbinding preferences encoded by Prdm9 among these species.

Hybrid Sterility in Equids
Allelic incompatibilities in the Prdm9 gene, as described in the house mouse [9,47] represent a likely genetic mechanism to explain hybrid sterility in equids. However, in the fertile female mule, variation in the number of ZF domains does not seem to produce allelic incompatibilities within Prdm9 and then sterility. This result is not unexpected considering that fertility of hybrid house mouse females is never compromised due to allelic differences of Prdm9, given that sterility is male-biased (heterogametic sex) in consistency with the Haldane's rule [9].
No clear conclusion on the role of Prdm9 in equine hybrid sterility can then be inferred based upon our results, other than that Prdm9 variation of ZF domains does not seem by itself to contribute to sterility. This suggestion is confirmed by the single equid male known fertile in this study that also shows variation in the number of ZF domains, the domestic ass. Prdm9 gene may still be an important speciation gene in equids via genetic incompatibilities with other loci (epistatic effects) [47]. In particular, Prdm9 is known to interact with chromosome X (DXSr62 region) in the house mouse, and both loci are known to be necessary to produce F1 sterility [48]. Minor QTLs on chromosomes 13 and 14 also play a role in hybrid sterility, being sufficient to activate genetic incompatibilities in male mouse hybrids [48]. Hybrid sterility in the house mouse seems thus to have an oligogenic nature, indicating the importance of considering Prdm9 genetic interactions.
Undoubtedly, Prdm9 is not the sole gene associated with hybrid sterility in vertebrates, as this gene is absent in many species such  Figure 2. Lineage-specific ZF domains are mapped on the phylogeny of equids according to [33]. Domestic horse individuals showed the same Prdm9 sequence for the last exon, thus only one sequence is depicted. The onager Prdm9 sequence obtained from cDNA is designated by an asterisk. doi:10.1371/journal.pone.0061746.g003 as Xenopus, Anolis, and Gallus [12] or has become a pseudogene as confirmed in dogs [16]. Nonetheless, our findings of rapid evolution of Prdm9 ZF domains in addition to the known function of this gene in specifying meiotic recombination hotspots, support its important role in gametogenesis in equids. Further studies considering additional equid hybrids and investigating genetic interactions of Prdm9 with other loci will contribute to clarify the role of this gene establishing reproductive isolation barriers in this threatened group of mammals.  Supporting Information