rahu is a mutant allele of Dnmt3c, encoding a DNA methyltransferase homolog required for meiosis and transposon repression in the mouse male germline

Transcriptional silencing by heritable cytosine-5 methylation is an ancient strategy to repress transposable elements. It was previously thought that mammals possess four DNA methyltransferase paralogs—Dnmt1, Dnmt3a, Dnmt3b and Dnmt3l—that establish and maintain cytosine-5 methylation. Here we identify a fifth paralog, Dnmt3c, that is essential for retrotransposon methylation and repression in the mouse male germline. From a phenotype-based forward genetics screen, we isolated a mutant mouse called ‘rahu’, which displays severe defects in double-strand-break repair and homologous chromosome synapsis during male meiosis, resulting in sterility. rahu is an allele of a transcription unit (Gm14490, renamed Dnmt3c) that was previously mis-annotated as a Dnmt3-family pseudogene. Dnmt3c encodes a cytosine methyltransferase homolog, and Dnmt3crahu mutants harbor a non-synonymous mutation of a conserved residue within one of its cytosine methyltransferase motifs, similar to a mutation in human DNMT3B observed in patients with immunodeficiency, centromeric instability and facial anomalies syndrome. The rahu mutation lies at a potential dimerization interface and near the potential DNA binding interface, suggesting that it compromises protein-protein and/or protein-DNA interactions required for normal DNMT3C function. Dnmt3crahu mutant males fail to establish normal methylation within LINE and LTR retrotransposon sequences in the germline and accumulate higher levels of transposon-derived transcripts and proteins, particularly from distinct L1 and ERVK retrotransposon families. Phylogenetic analysis indicates that Dnmt3c arose during rodent evolution by tandem duplication of Dnmt3b, after the divergence of the Dipodoidea and Muroidea superfamilies. These findings provide insight into the evolutionary dynamics and functional specialization of the transposon suppression machinery critical for mammalian sexual reproduction and epigenetic regulation.

Introduction Transposable elements have been described as 'dark energy' that acts both as a creative force, by giving rise to new genes and regulatory elements, and as a threat, by disrupting genome architecture [1]. Transposons occupy roughly 46% and 38% of the human and mouse genomes, respectively, and retrotransposons are the predominant class, including some that continue to be retrotransposition-competent [2][3][4]. Genomes have co-evolved with their retrotransposons and thus have multiple defense mechanisms to restrain transposon activity [4]. This restraint is of utmost importance in the germline, where retrotransposon activity not only facilitates vertical transmission, but also threatens genome integrity and germ cell viability.
The primary means of retrotransposon suppression is transcriptional silencing via cytosine-5 methylation [1,5]. In the mouse male germline, after a developmentally programmed methyl-cytosine erasure, retrotransposon methylation is re-established in prospermatogonia prior to birth [6][7][8][9]. Shortly after birth, spermatogenesis initiates and spermatogonia enter a meiotic cell cycle, which encompasses an extended prophase. During meiotic prophase, the spermatocyte genome experiences developmentally programmed DNA double-strand breaks (DSBs) that are subsequently repaired by homologous recombination. Contemporaneously, homologous chromosomes align and build a proteinaceous structure called the synaptonemal complex (SC), which holds recombining chromosomes together (synapsis). Meiosis is completed by two successive divisions, forming haploid spermatids that enter the final developmental phase of spermiogenesis, culminating with the production of sperm [10]. Failure to methylate retrotransposons leads to abnormal retrotransposon expression, spermatogenic arrest in meiotic prophase, and sterility [11,12].
Here, we identify a new, fourth member of the Dnmt3 family in mice. Using a forward genetics approach, we isolated a male-sterile mutant that we named rahu, for "recombinationaffected with hypogonadism from under-populated testes" (Rahu is a harbinger of misfortune and darkness in Vedic mythology). rahu mapped to a missense mutation in the predicted Dnmt3 pseudogene Gm14490. Re-named Dnmt3c, this gene is required for the methylation and repression of retrotransposons in the male germline. Dnmt3c was also independently discovered by Bourc'his and colleagues [30]; importantly, our results validate and expand their findings on the developmental role of Dnmt3c. We propose that Dnmt3c encodes a de novo DNA methyltransferase that diversified from DNMT3B during evolution of the Muroidea and that became functionally specialized for germline defense against retrotransposon activity.

Isolation of the rahu meiotic mutants
We performed a random mutagenesis screen to recover mutants with autosomal recessive defects in meiosis. Male mice of the C57BL/6J (B6) strain were mutagenized with the alkylating agent Nethyl-N-nitrosourea (ENU) [31][32][33], then a three-generation breeding scheme was carried out to "homozygose" recessive mutations, including outcrossing to FVB/NJ (FVB) mice to facilitate downstream genetic mapping (Fig 1A) [32,34]. First, ENU-mutagenized B6 males were crossed to wild-type FVB females to generate founder males (F1) that are potential carriers of a mutation of interest. Then, each F1 male was crossed to wild-type FVB females to generate second-generation females (G2); if the F1 male was heterozygous for a mutation of interest, half of his daughters should be carriers. Finally, G2 daughters were crossed back to their F1 father to generate thirdgeneration males (G3), of which one eighth should be homozygous for the mutation.
We screened juvenile (15)(16)(17)(18)(19) days post-partum (dpp)) G3 males for meiotic defects, focusing on recombination or its interdependent process, chromosome synapsis. To this end, we immunostained squashed spermatocyte nuclei for well-established meiotic markers whose deviations from wild-type patterns are diagnostic of specific defects: SYCP3, a component of axial elements and of lateral elements of the SC [10,35,36], and phosphorylated histone H2AX (γH2AX), which forms in response to meiotic DSBs [37]. As axial elements begin to form during the leptotene stage of meiotic prophase in wild type, SYCP3 staining appears as dots or short lines; axial elements elongate during zygonema and the first stretches of tripartite SC appear; SC juxtaposes autosomes all along their lengths in pachynema; and the SC begins to disassemble in diplonema (Fig 1B). Most DSBs are formed in leptonema and zygonema, resulting in florid staining for γH2AX at these stages, but this signal disappears from autosomes over the course of zygonema and pachynema as chromosomes synapse and DSBs are repaired by recombination (Fig 1B). γH2AX also accumulates in a recombination-independent manner in the sex body, a heterochromatic domain encompassing the sex chromosomes [37][38][39][40] (clearly visible in pachynema and diplonema in Fig 1B). Mutants with defects in recombination and/or SC formation deviate from these wild-type patterns in diagnostic ways [10,36]. To streamline the screening process, we first used the SYCP3 staining pattern to classify meiotic prophase spermatocytes as either early prophase-like (leptonema or early zygonema) or later prophase-like (late zygonema, pachynema or diplonema). The γH2AX staining pattern was then evaluated to determine whether DSBs were formed and repaired properly and whether sex body formation appeared normal.
From F1 founder lines screened in this manner, we isolated a mutant line with SYCP3 and γH2AX patterns symptomatic of defects in meiotic DSB repair and/or synapsis (Fig 1B and  1C). Four of 28 G3 males from this line displayed a mutant phenotype in which spermatocytes with SYCP3 staining characteristic of early prophase I were abundant but later stages were absent or greatly depleted, indicating a block to meiotic progression (Fig 1B and 1C and S1 Table). Early prophase-like spermatocytes showed nucleus-wide γH2AX staining similar to wild type, consistent with formation of meiotic DSBs (Fig 1B). However, the mutants also had elevated numbers of abnormal spermatocytes displaying nucleus-wide γH2AX along with longer tracks of SYCP3 staining that were consistent with varying degrees of synapsis (Fig 1B and  1C). This pattern is a hallmark of recombination-and synapsis-defective mutants such as Msh5 -/and Sycp1 -/- [36,37,[41][42][43].
rahu mutant males fail to complete meiosis, leading to male-specific infertility Adult male rahu mutants were sterile, with none of the four animals we tested siring progeny. The mutant males displayed pronounced hypogonadism, with a 67% reduction in testes-tobody-weight ratio compared to littermates (mean ratios were 0.20% for rahu mutants and 0.62% for wild-type and heterozygous animals; p<0.01, one-sided Student's t-test; Fig 2A). In histological analysis of testis sections, seminiferous tubules from adult rahu mutants contained only early spermatogenic cells and completely lacked post-meiotic germ cells (Fig 2B). The most developmentally advanced cell types visible appeared apoptotic (Fig 2B), and increased apoptosis was confirmed by TUNEL staining (Fig 2C). The patterns suggest that spermatocyte arrest and apoptosis occur during mid-pachynema, possibly at stage IV of the seminiferous epithelial cycle, which is typical of mutants unable to properly synapse their chromosomes, repair meiotic DSBs, and/or transcriptionally silence their sex chromosomes [43][44][45][46]. Stage IV arrest is also observed in Dnmt3l mutants, which lack germline DNA methylation [17,28].
In crosses with rahu heterozygous males, rahu homozygous females were fertile with an average litter size of 8.3 (19 litters from 10 dams), similar to rahu heterozygous females with an average litter size of 8.1 (29 litters from 10 dams). The rahu allele segregated in an expected Mendelian ratio: 26% wild type, 48% heterozygotes, and 26% mutant homozygotes from heterozygous dams (n = 119); 49% heterozygotes and 51% mutant homozygotes from homozygous mutant dams (n = 101). Homozygous mutants survived to adulthood and no morphological abnormalities have been apparent. We conclude that rahu does not severely impair female meiosis, embryonic development, or adult somatic functions.
rahu is an allele of the novel gene Dnmt3c We mapped the likely rahu causative mutation using a genetic polymorphism-based positional cloning strategy [34,47]. Because the mutagenized B6 mice were outcrossed to FVB mice ( Fig  1A), ENU-induced mutations should be linked to DNA sequence polymorphisms from the B6 background, so G3 males homozygous for recessive phenotype-causing mutations should also be homozygous for linked B6 polymorphisms. Furthermore, mutant mice from the same line should be homozygous for at least some of the same linked B6 polymorphisms, whereas related but phenotypically normal mice should not. We therefore searched for genomic regions of B6 single-nucleotide polymorphism (SNP) homozygosity that are shared between mutant rahu mice and not shared with phenotypically normal mice from the rahu line.
We first coarsely mapped such regions by hybridization of genomic DNA from seven rahu mutants to a mouse SNP genotyping microarray. This yielded a single cluster of SNPs spanning 33.58 Mbp flanked by heterozygous SNPs gnf02.126.027 (Chr2:127,800,747) and rs3664408 (Chr2:161,380,222) (Fig 3A). Next, whole-exome sequencing of mutants identified seven un-annotated homozygous DNA sequence variants within the 33.58-Mbp mapped region (Fig 3A). To determine which was the likely causal mutation, we manually genotyped sequence variants within the 33.58-Mbp region in both mutant and phenotypically normal mice, targeting strain polymorphisms as well as the presumptive ENU-induced lesions themselves (Fig 3A). Presumptive ENU-induced lesions that were homozygous in phenotypically normal mice or that were heterozygous in meiosis-deficient mutants could be excluded as candidates. We applied this strategy to~100 additional G3 and G4 mice, which allowed us to winnow the phenotype-causing mutation to within a 17.43-Mbp region flanked by the sequence change in the Rrbp1 gene (Chr2:143,947,738) and SNP rs3664408 (Chr2:161,380,222) (Fig 3A). This smaller region contained only one novel sequence variant, within a gene model named Gm14490 (NCBI Gene ID: 668932 and Ensembl Gene ID: ENSMUSG00000082079).
The rahu lesion in Gm14490 is an A to G nucleotide transition at position Chr2:153,727,342 (Fig 3B). Surprisingly, however, Gm14490 was annotated as a pseudogene. It was first identified as a paralog of Dnmt3b, but the absence of expressed sequence tags and the presence of stop codons in the available gene build led previous researchers to conclude that Gm14490 was a pseudogene [48]. However, more recent gene builds predicted a gene structure, encompassing introns with an open reading frame, that is matched by testis transcripts (Fig 3B-3D). Mouse ENCODE RNA-sequencing data from adult testis revealed splice junctions within 5 bp of the boundaries for all exons except exon 5, suggesting that the predominant Gm14490 transcript isoform(s) in adult testis does not contain exon 5 (Fig 3B and 3C). In the adult mouse, Gm14490 is expressed in testis, with little or no expression in most somatic tissues other than brain (Fig 3D and 3E). Gm14490 is predicted to yield a 2,218-nucleotide transcript (Ensembl Transcript ID: ENSMUST00000119996) containing 19 exons. The rahu point mutation is located in exon 18 (Fig 3B). The available expression data and the manifestation of a phenotype led us to surmise that Gm14490 is not a pseudogene.
We conclude that rahu is allelic to the Gm14490 frameshift mutations and that rahu is likely to be a null or near-null for function of the gene. We further conclude that Gm14490 is not a pseudogene and that it is essential during meiotic prophase in spermatocytes. On the basis of sequence homology to Dnmt3b and functional data shown below, we refer to Gm14490 henceforth as Dnmt3c, also in keeping with a recent independent study [30]. 58-Mbp region of B6 SNP homozygosity shared between mutants is highlighted in yellow. A detailed view of variants within this region is shown on the right for two informative rahu mutants (c, f) and two informative phenotypically wild-type mice (i, j). The reference SNP ID numbers (rs ID) for known variants and the gene names of previously unannotated novel variants (i.e., presumptive ENU-induced lesions; asterisks) are listed. Phenotypes were assayed as shown in Fig  Dnmt3c encodes a likely DNA cytosine methyltransferase that is closely related to DNMT3B Dnmt3c is predicted to encode a 739-aa protein with 77% overall similarity to the DNA cytosine methyltransferase DNMT3B (EMBOSS Water tool, Smith-Waterman algorithm [49][50][51]). It contains a cysteine-rich ATRX-DNMT3-DNMT3L (ADD) domain with 98.3% sequence similarity to that of DNMT3B, and a 96.8% similar DNA cytosine methyltransferase domain (Fig 4A and S2 Table). DNMT3C contains matches to both the sequence and arrangement of the six highly conserved cytosine methyltransferase domain motifs (I, IV, VI, VIII, IX and X) that are characteristic of active DNA methyltransferases (Fig 4A and S2 Table) [13,52]. Among the mammalian DNMT3 family members, DNMT3C shares most similarity with DNMT3B (Fig 4B), although it lacks a clear match to the Pro-Trp-Trp-Pro (PWWP) domain in DNMT3B (Fig 4A) [53]. The rahu mutation causes a glutamic acid to glycine substitution at position 692 within motif IX (Fig 4A and 4C). These data suggest that Dnmt3c encodes a novel DNA methyltransferase and that the Dnmt3c rahu mutant phenotype is a consequence of perturbing its methylation function in the male germline.
In a crystal structure of the carboxy-terminal domains of DNMT3A and DNMT3L, these peptides form a tetrameric complex with two DNMT3A and DNMT3L dimers, which further dimerize through DNMT3A-DNMT3A interaction [19]. Mutating residues at the DNMT3A-DNMT3A interface abolishes activity [54]. Homology-based modeling of DNMT3C places E692 near the potential dimerization interface (Fig 4D and 4E), as well as near the inferred DNA recognition region [19]. Thus, the E692G mutation in DNMT3C may interfere with protein-protein interactions and/or protein-DNA interactions that are required for normal DNMT3C activity.
Dnmt3c is required for DNA methylation and transposon repression in the mouse male germline A fundamental role of DNA methylation in the germline is to silence retrotransposons [27], so we examined the expression of long interspersed nuclear element-1 (L1) and intracisternal A particle (IAP) retrotransposon families, which are known to be active in the germline [5,[55][56][57]. Because the mutants undergo spermatogenic arrest, we examined animals at 14 dpp. At this age, testes predominantly contain spermatogonial stem cells and early meiotic cells, with the most advanced stage being pachynema [58]. Dnmt3c rahu mutants displayed increased expression of both L1 (~9-fold) and IAP (~3-fold) transcripts as assessed by quantitative RT-PCR (Fig 5A) (mean fold change of 8.5 using L1 ORFs primers and 8.7 using L1 ORF2 primers; mean fold change of 3.6 using IAP Gag primers and 2.1 using IAP 3 0 LTR primers). We confirmed these findings by immunostaining testis sections for proteins encoded by L1 and IAP retrotransposons. In seminiferous tubules of heterozygotes, L1-encoded ORF1p was detectable at low levels and IAP-encoded Gag was barely detectable. However, both proteins accumulated to substantially higher levels in Dnmt3c rahu mutant testes (Fig 5B). Tubules of mutant juveniles contained L1 ORF1p in spermatocytes and IAP Gag in spermatogonia. Tubules from mutant adults contained prominent immunofluorescence signal for both proteins in spermatocytes. Thus, disruption of Dnmt3c derepresses expression of L1 and IAP retrotransposons in the male germline.
Next we assayed transposon DNA methylation directly by methylation-sensitive digestion and Southern blotting. Genomic DNA from juvenile Dnmt3c rahu mutants and wild-type littermates was digested with HpaII, for which DNA cleavage is blocked by CpG methylation within its recognition site, or with its methylation-insensitive isoschizomer MspI as a control. Digested DNA was separated on agarose gels and hybridized on Southern blots to a probe derived from the 5 0 UTR of L1_MdA (A family of L1) sequences. DNA purified from somatic tissue (tail) was highly resistant to cleavage by HpaII, whether purified from homozyogous mutants or a wild-type littermate (Fig 5C). This indicates that Dnmt3c is dispensable for DNA methylation in the soma, at least for L1Md_A elements. Testis DNA was also almost completely resistant to cleavage with HpaII when purified from wild type, but substantially less so when purified from Dnmt3c rahu homozygous mutants (Fig 5C). We conclude that Dnmt3c is needed to establish normal levels of DNA methylation within L1Md_A elements, specifically in the germline. Taken together, these findings support a germline-specific function for Dnmt3c in retrotransposon DNA methylation and transcriptional repression.

Dnmt3c represses transcription of distinct L1 and ERVK retrotransposon families
To more thoroughly assess the contribution of Dnmt3c in repressing retrotransposons, we performed RNA-seq on whole-testis samples from the same 14-dpp-old Dnmt3c rahu mutant and heterozygous littermates analyzed by RT-PCR. The expression levels of distinct LINE and LTR families were up-regulated in Dnmt3c rahu mutants, but SINE elements appeared to be changed little, if at all (Fig 6A). Because of variable expression between littermates of the same genotype, we analyzed the fold change of median expression values between mutants and heterozygotes (Fig 6B). Retrotransposons belonging to the L1 and ERVK superfamilies showed the strongest derepression. Specifically, L1 families L1Md_Gf, L1Md_T, L1_MdA and L1_Mm were up-regulated 12.1-, 9.6-, 6.4-and 2-fold, respectively. ERVK families IAPEZ-int, IAPLTR1_ Mm, MMERVK10C-int, RLTR10C and IAPA_MM-int were up-regulated 10.1-to 4.5-fold. Two ERVK elements, IAPLTR3 and IAPLTR3-int were down-regulated in Dnmt3c rahu mutants (by 2.8-and 4.1-fold respectively).
The most affected L1 families in Dnmt3c rahu mutants are the same as those derepressed in Dnmt3l mutants, namely, young L1 families of the A, T and Gf subtypes [29,59]. Comparison of the expression fold change in Dnmt3l -/knockouts with Dnmt3c rahu mutants showed a striking overlap in effects for both the L1 and the LTR families (Fig 6C). This overlap is not an indirect effect of down-regulated Dnmt3l expression in Dnmt3c rahu mutants, as RNA-seq coverage for neither Dnmt3l nor Dnmt3c was significantly changed in Dnmt3c rahu mutants (Dnmt3l fold change 1.2, Dnmt3c fold change 1.0).

Dnmt3c is required for methylation of L1, ERVK, and ERV1 retrotransposon families
To determine the global contribution of Dnmt3c to retrotransposon methylation, we performed whole-genome bisulfite sequencing (WGBS) on whole-testis samples from six 12-dppold Dnmt3c rahu mutant and wild-type littermates. CpGs within genes, CpG islands and SINE elements were mostly non-differentially methylated in Dnmt3c rahu mutants (Fig 7A). An increase in the proportion of differentially methylated CpGs, specifically hypomethylated CpGs, was observed within LINE and LTR elements (16.04% and 6.62% hypomethylated CpGs, respectively) (Fig 7A). Meta-plots averaging over specific element types showed that LINE elements were on average hypomethylated near their 5 0 ends, while LTR elements were hypomethylated on average across their entire bodies (Fig 7B).
Dnmt3c arose by tandem duplication of Dnmt3b after divergence of the Dipodoidea and Muroidea rodent families Dnmt3c is located directly adjacent to Dnmt3b in the mouse genome, and the two genes share 50.4% DNA sequence identity (EMBOSS Water tool, Smith-Waterman algorithm [49][50][51]). Dot-plot analysis of the genomic region encompassing both genes showed that the sequence identity shared between Dnmt3c and Dnmt3b extends over long stretches (up to 436 bp of 100% sequence identity and 2,805 bp of >95% sequence identity; appearing as horizontal lines in Fig 8A). Sequence similarity begins~3,300 bp upstream of the annotated start of Dnmt3c, which matches the intronic region between Dnmt3b exons two and three. Similarity extends 100 bp beyond the last annotated exon of Dnmt3c and matches the 3 0 UTR of Dnmt3b, suggesting that these regions encode functional elements that have constrained sequence divergence (Fig 8B). Dnmt3c and Dnmt3b are remarkably similar with respect to their exon organization (Fig 8B), with stretches of sequence identity encompassing all Dnmt3c exons, except exons two and five. The sequence similarity in Dnmt3c is most extensive near its 3 0 end (exons eight to nineteen), which encodes the ADD domain and methyltransferase motifs. As expected from the absence of the PWWP domain in DNMT3C, two of the three exons encoding this domain in Dnmt3b do not have obvious matches in the corresponding part of the genomic sequence of Dnmt3c (Fig 8B). Although Dnmt3c and Dnmt3b share extensive nucleotide identity, they are functionally specialized: whereas Dnmt3b establishes methylation both in somatic cells during embryogenesis and in the germline [7,15,26], Dnmt3c appears to function solely in the germline. Also, mouse and human DNMT3B protein sequences cluster separately from mouse DNMT3C (Fig 4B), suggesting that Dnmt3c and Dnmt3b have been evolving independently within the mouse lineage. These properties are consistent with a local gene duplication event followed by functional diversification.
Dot-plot comparisons of the mouse genomic region containing Dnmt3c and Dnmt3b with homologous regions in rat and human showed that the duplication is also present in rat (appearing as a continuous central diagonal plus two off-center partial diagonals indicated by the red arrows in Fig 8C) but is absent in human (appearing as two offset diagonals marked by red arrowheads in Fig 8C). To determine the ancestry of this duplication event, we analyzed species from related rodent families (Fig 8C and 8D) [60] (UCSC Genome Browser). Dot-plot comparisons showed clear evidence of the Dnmt3c and Dnmt3b duplicate pair in Upper Galilee Mountains blind mole rat, Chinese hamster, and prairie vole, but not in lesser Egyptian jerboa or more distantly related rodents (Fig 8C and 8D). These findings indicate that the tandem duplication of Dnmt3b occurred between 55 and 45 million years ago, after the divergence of (same mice analyzed in Fig 5A). Q-value is the Benjamini-Hochberg-adjusted p-value from DESeq2. Retrotransposons with expression fold change of >2 (up or down) and q < 0.01 are depicted as large, colored circles. (B) Heatmap showing the z-score of differentially expressed retrotransposon families (with expression fold change >2 and q < 0.01) in individual Dnmt3c rahu mutants (x, y, z) and their heterozygous littermates (x 0 , y 0 , z 0 ). Labels on rows indicate the retrotransposon family, followed by superfamily, followed by class, and then in parentheses the log 2 fold change of median expression in mutant versus heterozygote. Rows with greater than two-fold change in median expression (up or down) are in bold. The log 2 fold changes are also provided in the bar graph at left (greater than two-fold change shown as black bars). (C) Correlation between differentially expressed retrotransposon families in 14-dpp Dnmt3c rahu mutants and 20-dpp Dnmt3l mutants. The regression line is shown and r is the Pearson correlation coefficient.
https://doi.org/10.1371/journal.pgen.1006964.g006 rahu is a mutant allele of Dnmt3c  Fig 7. Retrotransposons belonging to the L1, ERVK, and ERV1 families are hypomethylated in Dnmt3c rahu mutants. (A) Proportions of differentially methylated CpGs within genomic elements, as determined by WGBS. WGBS was performed on whole-testis DNA samples from six 12-dpp animals (two Dnmt3c rahu mutant and two wild-type mice from one litter, and one Dnmt3c rahu mutant and one wild-type mice from a second litter). CpGs with >25% differential methylation (up or down) and with methylKit Sliding Linear Model-adjusted pvalue < 0.01 were considered differentially methylated. "Remaining intergenic" refers to genomic regions that do not overlap with LINE, LTR, SINE, genic, or CpG island annotations. (B) Meta-element plot showing the average difference in methylation levels between individual Dnmt3c rahu mutants and their wild-type littermates (mice u, v, u 0 , v 0 from one litter; w, w 0 from a second litter), across LINE, LTR, and SINE retrotransposons (including 5,000 bp of flanking sequence on each side) with minimum 95% coverage of the consensus sequence. (C) Heatmap showing the mean methylation levels of significantly changed retrotransposon families in individual Dnmt3c rahu mutants and their wild-type littermates (p < 0.01, two-sided Student's t-test). Labels on rows indicate the retrotransposon family and differentially expressed retrotransposon families (families with greater than two-fold increase in median expression in Dnmt3c rahu mutants; rows in bold in Fig 6B) are in bold. Three differentially expressed retrotransposon families with changed methylation levels using a less stringent cutoff (p < 0.05, twosided Student's t-test) are included and are indicated with an asterisk. The median methylation difference is provided in the bar graph at left. Each black dot on the plot represents 100% identity within a 20-bp window, and blue-tinted rectangles highlight these regions. Each orange dot represents 100% identity within a 13-bp window, and orange-tinted rectangles highlight such regions when they are exonic or lie along the diagonal axis. Gene models annotated with exons encoding conserved domains are shown schematically along the axes. (C) Dot-plot comparisons of the M. musculus 156,377-bp region shown in (A) with its homologous region in other rodents (Rattus norvegicus, Norway rat; Cricetulus griseus, Chinese hamster; Microtus ochrogaster, prairie vole; Nannospalax galili, Upper Galilee Mountains blind mole rat; Jaculus jaculus, lesser Egyptian jerboa; Dipodomys ordii, Ord's kangaroo rat; Castor canadensis, American beaver; Ictidomys tridecemlineatus, thirteen-lined ground squirrel; Cavia porcellus, domestic guinea the Dipodoidea and Muroidea rodent superfamilies, but before the divergence of the Cricetidae and Muridae families [61].

Discussion
This study illustrates the utility of forward genetic screens in identifying novel meiotic genes in mouse and reports the identification of Dnmt3c, a new gene essential for mouse spermatogenesis and genome regulation. We find that Dnmt3c functions in the methylation and subsequent repression of retrotransposons in the male germline, and that it arose by tandem duplication of the Dnmt3b gene. While this work was in progress, analogous findings were reported independently [30], and results from both studies agree well. Uniquely, we have isolated a novel point-mutated allele of Dnmt3c (rahu) that harbors a non-synonymous mutation of a conserved residue, and performed expression and epigenetic profiling of this mutant.
The presence of the six highly conserved cytosine methyltransferase motifs in DNMT3C and the methylation defect observed in Dnmt3c rahu mutants, which harbor a mutation within one motif (E692G), suggest that DNMT3C is enzymatically active. Indeed, DNMT3C methylates cytosines in vitro, and expression of Dnmt3c in Dnmt1 Dnmt3a Dnmt3b triple knockout embryonic stem cells leads to a gain in DNA methylation [30]. We found that homology-based modeling of the DNMT3C methyltransferase domain using the DNMT3A crystal structure places the E692 amino acid at a potential dimerization interface, as well as near the potential DNA binding interface [19]. Thus, the rahu mutation may compromise protein-protein interactions and/or protein-DNA interactions of DNMT3C. Intriguingly, a mutation affecting the corresponding region in DNMT3B is observed in patients suffering from immunodeficiency, centromeric instability and facial anomalies (ICF) syndrome, an autosomal recessive disease caused by mutations in DNMT3B [15,62]. The patient-derived mutation results in a threeamino-acid (serine-threonine-proline) insertion immediately downstream of the corresponding glutamic acid that is mutated in Dnmt3c rahu mutants (DNMT3B Uniprot VAR_011502). Expression of recombinant mutant DNMT3B harboring this insertion in cell lines indicated that the mutation does not severely compromise protein stability. Rather, nuclear localization patterns were abnormal, suggesting that these residues may function in targeting DNMT3B to specific genomic regions [63]. We speculate that the E692G substitution in Dnmt3c rahu mutants may similarly interfere with DNMT3C localization to target sequences.
In mouse, Dnmt3c and Dnmt3l mutants have similar meiotic phenotypes, including male infertility due to spermatocyte arrest apparently occurring at epithelial stage IV ( [17,18,27,28,30] and this study). Both genes are required for methylating and silencing retrotransposons in the male germline, and, consistent with previous findings, comparison of RNA-seq data from Dnmt3c rahu mutants with published Dnmt3l -/data suggests considerable overlap in the transposon families that they target [27,29,30]. Dnmt3l is also required for methylation at imprinted loci [7,17,18,26], but we did not observe embryonic defects characteristic of loss of methylation at maternally imprinted loci: Dnmt3c mutant females produced healthy litters of expected size, and their offspring survived to adulthood without discernible abnormalities, consistent with the findings of Barau et al. Among paternally imprinted loci, Barau et al. observed hypomethylation in Dnmt3c mutants only at the Rasgfr1 imprinting control region. pig), a lagomorph (Oryctolagus cuniculus, rabbit), and human (Homo sapiens). Each dot on the plot represents 100% identity within a 15-bp window. Yellow-tinted rectangles highlight M. musculus Dnmt3b and Dnmt3c, as well as Dnmt3b in rat and human. Gene models are shown for M. musculus, R. norvegicus, and H. sapiens. The putative Dnmt3c gene location in R. norvegicus is depicted by the gray dashed line above the dot plot. Segments of contiguous inter-species sequence identity between Dnmt3b and Dnmt3c appear as off-center partial diagonals (arrows) for those species that harbor the Dnmt3b and Dnmt3c pair, or as two offset diagonals (arrowheads) for species that lack the duplication. (D) Cladogram showing the evolutionary relationship of species analyzed (UCSC Genome Browser; [60] They also showed that Dnmt3l mutants were globally hypomethylated, including at intragenic and intergenic regions. In comparison, hypomethylation in Dnmt3c mutants is primarily restricted to L1, ERVK, and ERV1 retrotransposons ( [30] and this study). Taken together, these results suggest that Dnmt3c provides a more specialized contribution than Dnmt3l to the germline methylation landscape.
The majority of differentially methylated regions in Dnmt3c mutants overlap with transposons ( [30] and this study), so it is likely that the SC and γH2AX defects in Dnmt3c rahu mutants are indirect effects of transposon hypomethylation. Loss of transposon methylation in Dnmt3l mutants leads to abnormal levels of meiotic DSBs within transposon sequences, which in turn are thought to lead to deleterious non-allelic recombination events, culminating in meiotic arrest [29]. Similarly, transposon hypomethylation in Dnmt3c rahu mutants may result in an abnormal meiotic DSB landscape. A non-exclusive alternative is that the Dnmt3c rahu meiotic recombination defect may be linked to retrotransposon derepression, for example, via accumulation of DNA damage induced by a transposon-encoded endonuclease activity. This idea is supported by the presence of SPO11-independent DSBs in mice lacking Maelstrom, a piRNA pathway component required for transposon methylation and repression [64,65]. Maelstrom mutant mice that also lack SPO11, whose catalytic activity is required for meiotic DSBs, show extensive immunofluorescence staining for DSB markers, consistent with damage that is mechanistically distinct from that induced during developmentally programmed meiotic recombination events. Yet another possibility is that transposon hypomethylation in Dnmt3c rahu mutants perturbs the expression of neighboring meiotic genes.
Comparative genomic analyses suggest that Dnmt3c arose in the Muroidea phylogenetic lineage by duplication of, and subsequent divergence from, Dnmt3b ( [30] and this study). It is conceivable that Dnmt3c neo-functionalized in response to selective pressure imposed by an increase in the retrotransposon load within the genome. An alternative hypothesis is that in organisms that lack Dnmt3c, its function is performed by a Dnmt3b isoform or by a yet-to-be discovered DNMT3 paralog. Given that the duplication is specific to muroid rodents and that Dnmt3c was previously mis-annotated as a pseudogene, its discovery exemplifies the power of forward genetic approaches. Moreover, the rapid evolution of meiotic proteins and the diversity of meiotic strategies adopted across different taxa necessitate organism-specific approaches. With advances in genomics facilitating the molecular characterization of phenotype-causing lesions identified in forward genetic screens, this approach will continue to be fruitful in furthering our understanding of gametogenesis.

Generation of Dnmt3c rahu animals
All experiments conformed to regulatory standards and were approved by the Memorial Sloan Kettering Cancer Center (MSKCC) Institutional Animal Care and Use Committee. Male mice of the C57BL/6J background were mutagenized by three weekly injections of 100 μg ENU/g of body weight, then outcrossed to FVB/NJ females. Wild-type mice of both inbred strains were purchased from The Jackson Laboratory. The mutagenesis and three-generation breeding scheme to generate homozygous mutant offspring were conducted as described elsewhere [31,34] (Fig 1A). To minimize the chance of repeated recovery of the same ENU-induced mutation, no more than ten F1 founder males were screened from each mutagenized male. Each F1 founder male was used to generate !six G2 females and~24 G3 males.
For screening, testes from G3 males were dissected, snap-frozen in liquid nitrogen, and stored at -80˚. Males were screened for meiotic defects at !15 dpp, by which age spermatocytes in the relevant stages of meiotic prophase I are abundant [58]. An upper age limit of 19 dpp was imposed to avoid the need for weaning. For a given line, spermatocyte squash preparation and immunostaining were carried out only after testes had been obtained from~24 G3 males. This side-by-side analysis facilitated comparisons of phenotypes between mice. One testis per mouse was used to generate squash preparations of spermatocyte nuclei and immunostained with anti-SYCP3 and anti-γH2AX antibodies as described below. The second testis was reserved for later DNA extraction if needed. Based on the extent of axial element and SC formation, SYCP3-positive spermatocyte nuclei were classified as either early prophase-like (equivalent to leptonema or early zygonema in wild type) or late prophase-like (late zygonema, pachynema, or diplonema). The γH2AX staining pattern was then evaluated. For each immunostained squash preparation, we aimed to evaluate~20 early prophase-like cells and~50 late prophase-like cells, if present. Priority for subsequent mapping and further analysis was given to lines that yielded at least two G3 males with similar spermatogenesis-defective phenotypes, derived from two or more G2 females.
Genotyping of Dnmt3c rahu animals was done by PCR amplification using rahu F and rahu R primers (S4 Table), followed by digestion of the amplified product with Hpy188I (NEB). The rahu mutation (A to G) creates a novel Hpy188I restriction site.

Generation of targeted Dnmt3c mutations
Endonuclease-mediated alleles were generated at the MSKCC Mouse Genetics Core Facility using CRISPR/Cas9. A guide RNA (target sequence 5 0 -CATCTGTGAGGTCAATGATG) was designed to target predicted exon 4 of Gm14490 (NCBI Gene ID: 668932 and Ensembl Gene ID: ENSMUSG00000082079) and used for editing as described [66]. Using the T7 promoter in the pU6T7 plasmid, the gRNA was synthesized by in vitro transcription and polyadenylated, then 100 ng/μl of gRNA and 50 ng/μl of Cas9 mRNA were co-injected into the pronuclei of CBA × B6 F2 hybrid zygotes using conventional techniques [67]. Founder mice were tested for presence of mutated alleles by PCR amplification of exon 4 using Gm14490 F1 and R1 primers (S4 Table), followed by T7 endonuclease I (NEB) digestion.
Mis-targeting of CRISPR/Cas9 to Dnmt3b was considered unlikely as the gRNA has 10 (50%) mismatches relative to the homologous region of Dnmt3b (in exon 7 of Ensembl transcript ENSMUST00000109774 (alignment of Gm14490 exon 4 and Dnmt3b exon 7 using EMBOSS Water tool, Smith-Waterman algorithm [49][50][51]). Also, this Dnmt3b region lacks the protospacer adjacent motif (PAM). Nonetheless, we screened mice directly by PCR (Dnmt3b F1 and R1 primers; S4 Table) and T7 endonuclease I assay at the relevant region in Dnmt3b to rule out presence of induced mutations. Animals that were positive for Gm14490 mutation and negative for Dnmt3b mutation were selected for further analysis.
Dnmt3c em founder males mosaic for frame-shift mutations were bred to mutant Dnmt3c rahu females to generate compound heterozygotes carrying both the Dnmt3c rahu allele and a Dnmt3c em allele. Dnmt3c em -carrying founder mice were also interbred to generate homozygotes or compound heterozygotes carrying two distinct Dnmt3c em alleles.
Genotyping of Dnmt3c em animals was done by PCR amplification of the targeted region followed by Sanger sequencing. PCR amplification was done with either Gm14490 F1 and R2 primers followed by sequencing with Seq1, or with Gm14490 F2 and R2 primers followed by sequencing with Seq2 (S4 Table). Sequence traces were analyzed to determine the mutation spectrum as described above. Dnmt3c em animals were also genotyped for the homologous region in Dnmt3b by PCR amplification with Dnmt3b F2 and R1 primers followed by sequencing with Seq1 (S4 Table).

Genetic mapping and exome sequencing
All genome coordinates are from mouse genome assembly GRCm38/mm10 unless indicated otherwise. For genetic mapping, the screen breeding scheme (Fig 1A) was expanded: additional G2 females were generated and crossed to their F1 sire, and were identified as mutation carriers if they birthed G3 males displaying the Dnmt3c rahu phenotype. Breeding of G2 carriers to the F1 founder was continued to accrue additional homozygous mutants.
The Dnmt3c rahu phenotype was coarsely mapped by microarray-based genome-wide SNP genotyping using the Illumina Mouse Medium Density Linkage Panel. To obtain genomic DNA, testes or tail biopsies were incubated in 200 μl of DirectPCR lysis reagent (Viagen) containing 1 μl of proteinase K (>600 mAU/ml, Qiagen) for 24 hr at 55˚. DNA was subsequently RNase A-treated, phenol:chloroform-extracted, and ethanol-precipitated. Microarray analysis was performed at the Genetic Analysis Facility, The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada. For bioinformatics analysis, 720 SNPs out of 1449 SNPs total on the linkage panel were selected based on the following criteria: autosomal location, allelic variation between B6 and FVB backgrounds, and heterozygosity in the F1 founder. For fine-mapping by manual genotyping of variants, genotyping was done by PCR amplification followed either by Sanger sequencing or by digestion with an appropriate restriction enzyme.
For whole-exome sequencing, DNA from three phenotypically mutant G3 mice was prepared as for microarray analysis and pooled into a single sample. Because the mutant mice should share the phenotype-causing mutation(s), we expected this pooling approach to boost the reliability of mutation detection. Whole-exome sequencing was performed at the MSKCC Integrated Genomics Operation. Exome capture was performed using SureSelectXT kit (Agilent Technologies) and SureSelect Mouse All Exon baits (Agilent Technologies). An average of 100 million 75-bp paired reads were generated. Read adapters were trimmed using FAS-TX-Toolkit version 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/) and read pairs were recreated after trimming using a custom Python script. Reads were aligned to mouse genome assembly GRCm38/mm10 using Burrows Wheeler Aligner-MEM software version 0.7.5a [71] with default settings, and duplicate reads were removed using Picard tools version 1.104 (https://broadinstitute.github.io/picard/). A minimum mapping quality filter of 30 was applied using SAMtools version 0.1.19 [72]. Genome Analysis Toolkit version 2.8-1-g932cd3a (Broad Institute; [73][74][75]) was used to locally realign reads with RealignerTargetCreator and IndelRealigner, to recalibrate base quality scores using BaseRecalibrator, and to identify variants using UnifiedGenotyper with the following settings: mbq 17; dcov 500; stand_call_conf 30; stand_ emit_conf 30. Variants were annotated using ANNOVAR software [76]. To obtain a list of potential phenotype-causing lesions, variants were filtered further to only include those that 1) had a minimum sequencing depth of six reads, 2) were called as homozygous, and 3) were not known archived SNPs (i.e., they lacked a reference SNP ID number). The positions of variants within the 33.5-Mbp mapped region that we identified using this strategy are as follows:

Histology
For histology, testes isolated from adult or juvenile mice were immersed overnight in 4% paraformaldehyde (PFA) at 4˚with gentle agitation, followed by two 5-min washes in water at room temperature. Fixed testes were stored in 70% ethanol for up to 5 days. Testes were embedded in paraffin, then 5-μm-thick sections were cut and mounted on slides.
The periodic acid Schiff (PAS) staining, immunohistochemical TUNEL assay, and immunofluorescent staining were performed by the MSKCC Molecular Cytology core facility. Slides were stained with PAS and counterstained with hematoxylin using the Autostainer XL (Leica Microsystems) automated stainer. The TUNEL assay was performed using the Discovery XT processor (Ventana Medical Systems). Slides were manually deparaffinized in xylene, rehydrated in a series of alcohol dilutions (100%, 95% and 70%) and tap water, placed in Discovery XT, treated with proteinase K (20 μg/ml in 1× phosphate-buffered saline (PBS)) for 12 min, and then incubated with terminal deoxynucleotidyl transferase (Roche) and biotin-dUTP (Roche) labeling mix for 1 hr. The detection was performed with DAB detection kit (Ventana Medical Systems) according to manufacturer's instructions. Slides were counterstained with hematoxylin and mounted with coverslips with Permount (Fisher Scientific). The immunofluorescent staining was performed using Discovery XT. Slides were deparaffinized with EZPrep buffer (Ventana Medical Systems) and antigen retrieval was performed with CC1 buffer (Ventana Medical Systems). Slides were blocked for 30 min with Background Buster solution (Innovex), followed by avidin-biotin blocking (Ventana Medical Systems) for 8 min. Slides were incubated with primary antibody for 5 hr, followed by 60 min incubation with biotinylated goat anti-rabbit (1:200, Vector Labs). The detection was performed with Streptavidin-HRP D (part of DABMap kit, Ventana Medical Systems), followed by incubation with Tyramide Alexa Fluor 488 (Invitrogen) prepared according to manufacturer's instructions. After staining, slides were counterstained with 5 μg/ml 4 0 ,6-diamidino-2-phenylindole (DAPI) (Sigma) for 10 min and mounted with coverslips with Mowiol.
PAS-stained and TUNEL slides were digitized using Pannoramic Flash 250 (3DHistech) with 2× lens. Images were produced and analyzed using the Pannoramic Viewer software. Immunofluorescence images were produced using a LSM 880 (Zeiss) with 40× lens.

Cytology
Spermatocyte squashes were prepared as described [78], with few modifications. Isolated testes with the tunica albuginea removed were minced and suspended in 2% PFA in 1× PBS. Fixation was allowed for approximately 10 sec and spermatocytes were spotted onto slides. A coverslip was pressed down onto the spermatocytes to squash them, and the preparation was snap-frozen in liquid nitrogen. Slides were removed from liquid nitrogen and the coverslip was pried off, followed by three 5-min washes in 1× PBS with gentle agitation. Washed slides were rinsed in water, air dried and stored at -80˚. For immunofluorescence, slides were thawed in 1× PBS for 5 min with gentle agitation and stained as described [79]. Slides were incubated with blocking buffer (1× PBS with 0.2% gelatin from cold-water fish skin, 0.05% TWEEN-20, 0.2% BSA) with gentle agitation for 30 min. Slides were stained with primary antibodies overnight at 4˚, washed three times for 5 min in blocking buffer with gentle agitation, incubated with appropriate Alexa Fluor secondary antibodies (1:100; Invitrogen) for 30 min at room temperature, then washed three times for 5 min in blocking buffer. All antibodies were diluted in blocking buffer. Slides were rinsed in water and cover slips were mounted using mounting medium containing DAPI (Vectashield). Stained slides were stored at 4˚for up to 5 days. Immunostained slides were imaged on a Marianas Workstation (Intelligent Imaging Innovations; Zeiss Axio Observer inverted epifluorescent microscope with a complementary metal-oxide semiconductor camera) using a 63× oil-immersion objective.

Domain annotation
DNMT3C protein sequence was obtained by translating the 2,218-bp Ensembl transcript ENSMUST00000119996 (release 87) cDNA sequence. An additional two nucleotides (AG) were added to the predicted transcript end to create a stop codon, resulting in a 2,220-bp transcript. DNMT3B protein sequence was obtained by translating the 4,320-bp Dnmt3b-001 Ensembl transcript ENSMUST00000109774.8 (release 87) coding sequence; this translation has 100% sequence identity to M. musculus DNMT3B with accession number O88509. ADD and PWWP domains were predicted using the NCBI Conserved Domain Database Search tool (accession numbers cd11728 and cd05835, respectively) [80]. To determine the cytosine methyltransferase motif locations in DNMT3C and DNMT3B, the following sequences were aligned by Clustal Omega alignment method using MegAlign Pro software (DNA STAR, Lasergene): M. musculus DNMT3C sequence determined as described above, M. musculus DNMT3B (accession number O88509), H. sapiens DNMT3B (accession number Q9UBC3), M. musculus DNMT3A (accession number O88508), H. sapiens DNMT3A (accession number Q9Y6K1), M. musculus DNMT3L (accession number Q9CWR8), H. sapiens DNMT3L (accession number Q9UJW3), M. musculus DNMT1 (accession number P13864), H. sapiens DNMT1 (accession number P26358), and H. parahaemolyticus HhaI (accession number P05102). Then the six highly conserved motifs (I, IV, VI, VIII, IX, X) were annotated as defined for HhaI [52]. The cytosine methyltransferase domain was annotated as the start of Motif I to the end of Motif X. The start and end locations for the domains and motifs are listed in S2 Table. Clustal Omega alignments from MegAlign Pro were used to produce a tree using BioNJ algorithm [81], and the figure was prepared using FigTree version 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree). The model of DNMT3C was generated with Phyre2 protein structure prediction tool. The protein was modeled from template 2QRV (carboxy-terminal domain of DNMT3A, 77% identity), with a confidence score of 100%.

Methylation-sensitive Southern blotting
To extract genomic DNA, one half of a single testis or equivalent mg of tail tissue was incubated in 200 μl of DirectPCR lysis reagent (Viagen) containing 1 μl of proteinase K solution (>600 mAU/ml, Qiagen) for 24 hr at 55˚. DNA was subsequently RNase A-treated, phenol: chloroform-extracted twice, and ethanol-precipitated.~1 μg of DNA was digested for 4 hr at 37˚with the methylation-sensitive HpaII restriction enzyme (NEB) or its methylation-insensitive isoschizomer MspI (NEB).~250 ng of digested DNA was electrophoresed on a 0.9% agarose gel and transferred as described [82] to an Amersham Hybond-XL membrane (GE Healthcare). The L1 5 0 UTR probe has been described elsewhere [64] and corresponds to nucleotides 515-1628 of the L1 sequence (GenBank accession number M13002). The probe was random-priming labeled with [α 32 P]-dCTP using High Prime premixed solution (Roche). Hybridizations were carried out overnight at 65˚.

Quantitative RT-PCR and RNA sequencing
For RNA expression analysis, six littermates (three homozygous mutant and three heterozygous mice born from a cross between a Dnmt3c rahu/+ male and a Dnmt3c rahu/rahu female) aged 14 dpp were analyzed. Procedures involving commercial kits were performed according to manufacturers' instructions. Total RNA from one half of a single testis, after removing the tunica albuginea, was extracted using the RNeasy Plus Mini Kit containing a genomic DNA eliminator column (Qiagen). The RNase-Free DNase Set (Qiagen) was used to perform an oncolumn DNase digestion during RNA purification, as well as an in-solution DNase digestion after RNA purification, followed by RNA cleanup.
For RT-PCR, 1-3 μg of RNA was used with random hexamer primers to synthesize cDNA using the SuperScript III First-Strand Synthesis System (Invitrogen). cDNA was diluted fivefold or more for PCRs. Quantitative PCR was carried out using LightCycler 480 SYBR Green I Master (Roche) for detecting products on a LightCycler 480 II Real-Time PCR instrument (Roche). All reactions were done at 60˚annealing temperature, with an extension time of 20 sec for L1 ORF2, IAP 3 0 LTR and IAP Gag primers, and 40 sec for L1 ORFs primers (S4 Table). All reactions were done in triplicate and accompanied by control reactions using cDNA synthesized without reverse transcriptase (-RT controls). The success of reactions was confirmed by analysis of amplification curves, melting curves, and electrophoresis of representative amplification products on an agarose gel.
LightCycler 480 Software was used to quantify products by absolute quantification analysis using the second derivative maximum method. All crossing point (Cp) values were normalized to the mean Cp value obtained for triplicate Actb reactions, to get relative values. The mean of relative values for triplicate reactions was used to obtain mean relative values. The mean relative value represents the relative amount of product in any given mouse. To obtain fold change values, the mean relative value for each mouse was normalized to the mean of that obtained for three heterozygous mice.
RNA sequencing (RNA-seq) was performed at the Integrated Genomics Operation of MSKCC. 1 μg of total RNA underwent ribosomal depletion and library preparation using the TruSeq Stranded Total RNA LT kit (Illumina) with 6 cycles of post-ligation PCR. Sequencing was performed using Illumina HiSeq 2500 (2×100-bp paired-end reads) using the TruSeq SBS Kit v3 (Illumina). On average, 61 million paired reads were generated per sample. Resulting RNAseq fastq files were aligned using STAR version 2.4.0f1 [83] to the mouse genome (GRCm38/mm10) using Gencode M11 transcriptome assembly [84] for junction points. Coding genes and transposable elements were quantified using TEtoolkit [85] to annotate both uniquely and ambiguously mapped reads. Gencode annotation and Repbase database [86] for repetitive sequences and transposable elements were used during the quantification. Differentially expressed genes were calculated using DESeq2 [87] on the counts. For plotting, counts of transposable elements were normalized to all of the annotated reads including coding genes as counts per million (CPM). Dnmt3l RNA-seq data are published (GEO GSE57747 [29]). Expression values for 20-dpp-old Dnmt3l mutant and wild type, provided by the authors, were used to calculate retrotransposon expression fold change and matched to our RNAseq data by transposable element name.

Whole-genome bisulfite sequencing
WGBS was performed with six animals from two litters (two homozygous mutants and two wild-type mice from one litter, and one homozygous mutant and one wild-type mouse from a second litter) aged 12 dpp. Genomic DNA was extracted from whole testis by incubating a single testis in 200 μl of DirectPCR lysis reagent (Viagen) containing 1 μl of proteinase K solution (>600 mAU/ml, Qiagen) for 24 hr at 55˚. DNA was subsequently RNase A-treated, phenol: chloroform-extracted, and ethanol-precipitated.
Whole-genome bisulfite libraries were prepared and sequenced at the New York Genome Center (NYGC) using a tagmentation-based protocol developed by NYGC and J. Greally. In brief, 100 ng of genomic DNA was fragmented using tagmentation by Tn5 transposase and purified by Silane bead cleanup (Dynabeads MyOne Silane, Thermo Fisher Scientific). End filling was performed using dATP, dGTP, dTTP and methylated dCTP to protect added cytosines from bisulfite treatment. End-repaired DNA was purified by SPRI bead cleanup (Beckman), and subjected to bisulfite treatment and cleanup using EZ DNA Methylation-Gold MagPrep kit (Zymo Research). Illumina sequencing adapters (standard i5 adapter and custom i7 adapter) were added using PCR amplification. Finally, size selection of 300-bp to 800-bp fragments was performed using SPRI bead cleanup (Beckman). Libraries were sequenced on the Illumina X10 platform (v3 chemistry) using 2×150 cycles with standard Illumina read 1 primer and custom read 2 and i7 index primers to generate >375 million paired reads. Control library generated from Kineococcus radiotolerans was spiked at 20% to enhance library complexity.

Dot-plot and sequence analysis
Dot plots were made using custom Perl code available at http://pagelab.wi.mit.edu/materialrequest.html (David Page lab, Whitehead Institute, Cambridge, Massachusetts). A summary of sequences used is provided in S5 Table. All sequences were screened and masked for speciesspecific repetitive sequences prior to analysis using RepeatMasker software [93] with default settings. Two nucleotides (AG) were added to the annotated end of the Gm14490 transcript to create a stop codon. Beaver, guinea pig, and rabbit dot plots were vertically reflected to maintain a consistent gene order of Commd7 to Mapre1 from left to right. Gene models were created based on Ensembl (version 87) exon annotations. To make the species cladogram, artificial sequences were aligned to resemble published data [60] (UCSC Genome Browser; http://genome.ucsc.edu) and the figure was prepared using FigTree version 1.4.3.

Data availability
RNA-seq data and WGBS data are available at Gene Expression Omnibus (GEO) with the accession number: GSE100960.
Supporting information S1