A Multi-Megabase Copy Number Gain Causes Maternal Transmission Ratio Distortion on Mouse Chromosome 2

Significant departures from expected Mendelian inheritance ratios (transmission ratio distortion, TRD) are frequently observed in both experimental crosses and natural populations. TRD on mouse Chromosome (Chr) 2 has been reported in multiple experimental crosses, including the Collaborative Cross (CC). Among the eight CC founder inbred strains, we found that Chr 2 TRD was exclusive to females that were heterozygous for the WSB/EiJ allele within a 9.3 Mb region (Chr 2 76.9 – 86.2 Mb). A copy number gain of a 127 kb-long DNA segment (designated as responder to drive, R2d) emerged as the strongest candidate for the causative allele. We mapped R2d sequences to two loci within the candidate interval. R2d1 is located near the proximal boundary, and contains a single copy of R2d in all strains tested. R2d2 maps to a 900 kb interval, and the number of R2d copies varies from zero in classical strains (including the mouse reference genome) to more than 30 in wild-derived strains. Using real-time PCR assays for the copy number, we identified a mutation (R2d2WSBdel1) that eliminates the majority of the R2d2WSB copies without apparent alterations of the surrounding WSB/EiJ haplotype. In a three-generation pedigree segregating for R2d2WSBdel1, the mutation is transmitted to the progeny and Mendelian segregation is restored in females heterozygous for R2d2WSBdel1, thus providing direct evidence that the copy number gain is causal for maternal TRD. We found that transmission ratios in R2d2WSB heterozygous females vary between Mendelian segregation and complete distortion depending on the genetic background, and that TRD is under genetic control of unlinked distorter loci. Although the R2d2WSB transmission ratio was inversely correlated with average litter size, several independent lines of evidence support the contention that female meiotic drive is the cause of the distortion. We discuss the implications and potential applications of this novel meiotic drive system.

(R2d2 WSBdel1 ) that eliminates the majority of the R2d2 WSB copies without apparent alterations of the surrounding WSB/EiJ haplotype. In a three-generation pedigree segregating for R2d2 WSBdel1 , the mutation is transmitted to the progeny and Mendelian segregation is restored in females heterozygous for R2d2 WSBdel1 , thus providing direct evidence that the copy number gain is causal for maternal TRD. We found that transmission ratios in R2d2 WSB heterozygous females vary between Mendelian segregation and complete distortion depending on the genetic background, and that TRD is under genetic control of unlinked distorter loci. Although the R2d2 WSB transmission ratio was inversely correlated with average litter size, several independent lines of evidence support the contention that female meiotic drive is the cause of the distortion. We discuss the implications and potential applications of this novel meiotic drive system.

Author Summary
One of the strongest expectations in genetics is that chromosomes segregate randomly during meiosis. However, genetic loci that exhibit transmission ratio distortion (TRD) are sometimes observed in offspring of F1 hybrids. Meiotic drive is a type of non-Mendelian inheritance in which a "selfish" genetic element exploits asymmetric female meiotic cell division to promote its preferential inclusion in ova. We previously reported TRD on Chr 2 in the CC, a mouse recombinant inbred panel with contributions from three Mus musculus subspecies. Here we show that maternal TRD consistent with a novel meiotic drive system is caused by a copy number gain. This mutation is similar in size and structure to other known meiotic drive responders, such as the knobs of maize. A deletion of most of Introduction Mendel's Laws provide the theoretical foundation of transmission genetics and explain many of the inheritance patterns of biological traits in sexually reproducing organisms. The Laws state that each gamete receives a random collection of alleles-exactly one per pair of homologous loci-and that gametes unite at random. However, reports of exceptions to Mendelian inheritance date back almost to the rediscovery of Mendel's Laws, and have been instrumental in elucidating the mechanisms of genetic inheritance [1][2][3][4]. Transmission ratio distortion (TRD) is defined as a significant and reproducible violation of the inheritance ratios expected under Mendel's Laws [1,[5][6][7].
Most observations of TRD are due to selection acting upon the products of meiosis (gamete selection) or fertilization (differential pre-or post-natal survival) [5][6][7][8]. The latter is a relatively common occurrence in experimental crosses in many types of organisms including plants and animals [8,9], and is routinely used to classify the essentiality of genes and alleles [9][10][11][12][13][14]. However, a small but increasing number of observations of TRD can be ascribed to the differential

Extreme TRD in Chr 2 is present in the DO population
To test whether TRD of the WSB/EiJ allele in Chr 2 is present in the DO, we analyzed 1,175 animals from DO generation 8 (G8) that were genotyped using two related genotyping arrays (MUGA or MegaMUGA, see Materials and Methods). We sampled the genotypes of each individual at 1 Mb intervals along Chr 2 and then computed the overall frequencies of the eight founder alleles at each position. The WSB/EiJ allele was over-represented relative to the other seven founder alleles across a roughly 100 Mb region in the middle of Chr 2 (S2 Fig.). However, there was a striking difference in the level of distortion observed in the CC and the DO, with the WSB/EiJ allele frequency reaching a maximum of 0.22 in the CC compared to 0.55 in the DO. This result indicates that the additional outcrossing in the DO is associated with higher levels of TRD. We conclude that TRD favoring the WSB/EiJ allele is a general feature of crosses in the CC genetic background; however, the level of TRD may vary widely depending on the number of generations of outbreeding.

TRD is exclusive to heterozygous females
To determine the parental origin of the TRD, we analyzed 5,499 offspring from 18 experimental crosses in which exactly one parent was heterozygous for the WSB/EiJ allele in an interval spanning the region of maximum distortion on Chr 2 (75-90 Mb) [33][34][35]37,38]. In all cases the heterozygous parent was an F1 hybrid derived either from an intercross between the WSB/ EiJ inbred strain and one of eight other inbred strains (the seven founder strains of the CC or PWD/PhJ), or from two CC strains, of which one was homozygous for the WSB/EiJ allele on Chr 2 and the other was homozygous for a non-WSB/EiJ allele. F1 hybrids were mated to either C57BL/6J or FVB/NJ mice, and their progenies were euthanized at birth and genotyped using genetic markers located in the region of maximum distortion. For each cross, we computed the TR of the WSB/EiJ allele and the non-WSB/EiJ allele using the aggregate genotypes across all litters from parents with identical genotypes (Table 1).
TRs in six paternally segregating crosses (rows 1-6 in Table 1) were as expected under the null hypothesis of Mendelian segregation (range 0.482-0.524, p ! 0.37). In contrast, the mean TR in maternally segregating crosses (rows 7-18 in Table 1) was 0.666 and deviated significantly from the null hypothesis (p = 3.4x10 -89 ). We conclude that, in the genetic backgrounds tested, TRD in favor of the WSB/EiJ allele on Chr 2 is restricted to the progeny of heterozygous dams.
We also conclude that the grandparental origin of the WSB/EiJ allele has no influence on TRD because the TR levels were not significantly different between three pairs of reciprocal F1 dams (compare crosses 7 and 8, 9 and 10 and 17 and 18 in Table 1; p = 0.53, 0.11 and 0.59, respectively).
TRD maps to a 9.3 Mb interval in the middle of mouse Chr 2 To define the boundaries of the locus subject to TRD, we screened 61 CC lines and 378 DO mice that had been genotyped with MegaMUGA for recombinations involving the WSB/EiJ haplotype in the 75-90 Mb interval of Chr 2. We identified five DO females (DO-600, DO-681, DO-732, DO-832 and DO-OCA45) and two CC strains (CC039/Unc and CC042/GeniUnc) that each had at least one informative recombination (Fig. 1). Next, we mated four of the DO females (all except DO-OCA45 that was already heterozygous) and the two CC strains to one of two additional CC lines (CC001/Unc and CC005/TauUnc) that had no contribution from WSB/EiJ on Chr 2, to obtain heterozygous G1 hybrid females. Each hybrid female was genotyped with MegaMUGA and mated to FVB/NJ males (total of 35 crosses; S1 Table).
We found that dams carrying eight of the ten recombinant chromosomes exhibited significant TRD in the Chr 2 interval (TR range 0.69-1.0, p 2.1x10 -5 ; Fig. 1 A), but dams carrying two other recombinant chromosomes did not (TR = 0.48 and 0.37, p ! 0.72; Fig. 1 B). These results are consistent with our conclusion that heterozygosity on Chr 2 is required but not sufficient for TRD; therefore, dams with Mendelian transmission ratios were not used for mapping the locus subject to TRD. Dams with TRD in favor of the WSB/EiJ allele were all heterozygous for a 9.3 Mb interval (the candidate interval; boxed in Fig. 1 A). The proximal boundary of the candidate interval is defined by the recombination found in the CC strain CC039/Unc (i.e., the most distal SNP inconsistent with a WSB/EiJ haplotype). The distal boundary of the candidate  Table). We used the normalized per-base read depth from whole-genome sequence alignments generated by the Sanger Mouse Genomes Project [31,32,37,39] and the HR8 selection line to estimate the number of copies of R2d in 18 inbred strains (see Materials and Methods). Similar to C57BL/6J, 15 of the 18 strains, including 5 additional CC founder strains (A/J, 129S1/SvImJ, NOD/ShiLtJ, NZO/HlLtJ and PWK/PhJ) were copy number one (i.e., a single haploid copy), and CAST/EiJ was copy number two. In contrast, WSB/EiJ had an estimated copy number of 34, and SPRET/EiJ had an estimated copy number of 36, resulting in~4.4 Mb of additional DNA in those strains (Fig. 2 A). We sequenced 10 individuals from the HR8 selection line (for Figure 1. R2d maps to a 9.3 Mb candidate interval. CC and DO mice were crossed to generate G1 dams, which were then crossed to FVB/NJ sires to determine the TR in their progeny. Each G1 dam carries a chromosome that is recombinant for the WSB/EiJ haplotype (shown under the heading cis) and a non-WSB/EiJ chromosome (the haplotype on the homologue is shown at far right under the heading trans). Dams with the same diplotype in the central region of Chr 2 were grouped together to define ten unique diplotypes. The aggregate number of WSB/EiJ and non-WSB/EiJ alleles transmitted by dams of each diplotype are shown for dams A) with TRD and B) without TRD. Significance of TR deviation from Mendelian expectation of 0.5 was computed using one-sided binomial exact test (p-value). The contribution from the eight founders of the CC and DO are shown in different colors. Thick purple bars indicate the extent of WSB/EiJ contributions, and thin bars indicate the extent of contributions from all other strains. The black box indicates the boundaries of the R2d candidate interval as determined by the region that is WSB/EiJ in all dams with TRD. which Chr 2 TRD was also observed when mated to C57BL/6J [31,32,40]) to a total depth of 125x and aligned the reads to the reference genome. All 10 individuals had evidence of a copy number gain with the same boundaries as in WSB/EiJ and SPRET/EiJ (Fig. 2 A; mean copy number 24.5 +/-1.4, equating to~3 Mb of additional DNA).
We used two additional methods to assay the copy number of R2d. First, we identified sets of probes on two different genotyping arrays for which the sum hybridization intensity was highly correlated with the copy numbers estimated from sequencing read depth (34 probes in MDA and 3 probes in MegaMUGA; S3 and S4 Tables, respectively). Second, we used real-time quantitative PCR to estimate the R2d copy number (Fig. 2 B) using TaqMan assays internal to exons of the single protein-coding gene within R2d, Cwc22 (Fig. 2 C). Using that gene as a proxy for the copy number gain, we found that the copy number estimates from all three methods were highly concordant for the 28 sequenced strains/individuals.
Using the TaqMan assay, we also found that the M16i inbred strain has a high number of copies of R2d (Fig. 2 B). We conclude that a large increase (> 20-fold) in R2d copy number is found exclusively in strains with TRD (WSB/EiJ, SPRET/EiJ, HR8 and M16i) and that TRD consistently favors the transmission of the allele with the copy number gain.
The copy number gain maps~6 MB distal to R2d1 Many structural variants identified from whole-genome sequencing reads have uncertain genomic positions due to the challenge of mapping large variants that are absent from the reference genome. To determine the position of the copy number gain associated with R2d, we mapped the WSB/EiJ and CAST/EiJ alleles using segregating populations that have been genotyped at medium (MegaMUGA) or high (Mouse Diversity Array, MDA) density [26,40]. In the CC founder strains, probes located in R2d have hybridization intensities correlated with the number of copies estimated from aligned read depth and TaqMan CNV assays ( Fig. 2 A, B). The MDA provides robust discrimination between the reference (one copy), CAST/EiJ (two copies) and WSB/EiJ alleles (34 copies; Fig. 3 A). MegaMUGA is able to identify mice carrying the WSB/EiJ allele with little ambiguity (Fig. 3 B). Using the sum intensities of the informative probes as a quantitative trait, we mapped the WSB/EiJ and CAST/EiJ copy number gains in two independent populations and platforms. A genome scan identified a single, broad, highly significant peak on Chr 2 in each population, and those peaks overlap with each other and with the initial candidate interval for TRD ( Fig. 3 C-E). We conclude that the copy number gain is closely linked to R2d1. This location is consistent with the large copy number gain being the causative allele. Note that both genome scans (Fig. 3 C, D) demonstrate that all the extra R2d copies found in WSB/EiJ are located in this interval because no other significant peak is observed in either scan. QTL mapping using TaqMan readout as the phenotype confirmed this result ( Fig. 3 D, E).
Analysis of individual mice with recombinant chromosomes in the candidate interval revealed that the copy number gain maps to a 900 kb interval (the R2d2 locus; Chr 2 83,631,096-84,541,308; Fig. 2; Fig. 3 A, B). Specifically, the CAST/EiJ copy number gain (R2d2 CAST ; one additional copy of R2d) is located distal to the transition from the CAST/EiJ to the NZO/HILtJ haplotypes found in mice OR3172m10 and OR3172f9 because both mice have low hybridization intensity consistent with a single copy, hence they lack R2d2 CAST (Fig. 3 A; S4A Fig.). Similarly, the WSB/EiJ copy number gain (R2d2 WSB ; 33 additional copies of R2d) is located proximal to the transition from the WSB/EiJ to the CAST/EiJ haplotype found on DO mouse DP2-446, because it had high hybridization intensity consistent with the presence of R2d2 WSB (Fig. 3 B; S4B Fig.). These results demonstrate that R2d2 is not located immediately adjacent to R2d1 but approximately 6 Mb distal to it. The distal location of the copy number gain is confirmed by the analysis of the sum intensity of the three MegaMUGA probes that track R2d in two backcrosses involving the SPRET/EiJ inbred strain [26,41] (S4C Fig.).

Loss of R2d copies at R2d2 restores Mendelian transmission of Chr 2
We used the TaqMan assay to confirm R2d copy number in all heterozygous females tested for TRD (S1 Table; S5 Fig.). We identified a dam (DO-G13-44) that was homozygous for the WSB/EiJ haplotype across the entire candidate interval but produced offspring that were segregating for the copy number gain (Fig. 4 A). This was confirmed by estimating R2d copy number in each of 27 G3 females and 16 G4 progeny that were heterozygous for a WSB/EiJ haplotype (Fig. 4 B; S5 Fig.). We determined the TR in 825 progeny of G3 dams mated to FVB/ NJ sires. The TRs among the 27 G3 dams were significantly different (p = 4.9x10 -12 ). In the progeny of the 15 G3 dams with high copy number there was significant TRD in favor of the WSB/EiJ allele (TR = 0.78, p = 2x10 -30 ; Fig. 4 C). In contrast, we found absence of TRD in the 12 G3 dams that inherited the low-copy allele (TR = 0.53, p = 0.234). A genome scan for TRD as a binary trait demonstrated that presence or absence of TRD in this pedigree maps uniquely to the candidate interval ( Fig. 4

D, E).
We were also able to estimate that G3 dams with the low-copy allele had a copy number of 11. We conclude that the loss of~22 copies of R2d was sufficient to rescue Mendelian transmission, thus demonstrating that the copy number gain is causative of TRD. Meiotic drive is the most likely cause of maternal TRD at R2d2 The results presented above demonstrate that TRD at R2d2 is only observed in the progeny of heterozygous dams. This restricts the plausible causes of TRD to meiotic drive, genotype-dependent embryonic lethality (including genotype-dependent competition between embryos) or a combination of both. To identify the cause of TRD, we first determined whether TR levels (S6 Fig.; S1 Table) were correlated with litter size in 127 DO dams (these 56 DO-G13 and 71 DO-G16 females are a random sample from an outbred population). We observed a strong inverse correlation between average litter size and TR at R2d2 (r = -0.65, p = 7.2x10 -8 and r = -0.40, p = 5x10 -4 in the DO-G13 and DO-G16 dams, respectively; Fig. 5 A, B). We conclude that the presence and the strength of TRD are significantly associated with reduced litter sizes and thus with some type of embryonic lethality. We determined the relationship between TRD and litter size under the assumption of TRD caused exclusively by embryonic lethality [40,41] (S7 Fig.). Under this scenario, in both the DO-G13 and DO-G16 samples the observed average litter size is significantly greater than predicted based on TR (p = 0.021 and 6.0x10 -5 for DO-G13 and DO-G16 dams, respectively; S7 Fig.). We conclude that embryonic death alone could only account for a fraction of the "missing" progeny inheriting a non-WSB/EiJ (R2d2 NotWSB ) allele. We determined directly the levels of embryonic lethality in DO-G13 dams at mid-gestation (see Materials and Methods). We observed that dams with TRD had slightly, but not significantly, higher numbers of resorbed embryos present in utero than did dams with Mendelian segregation (1.3 ± 1.5 and 1.1 ± 1.2 resorbed embryos, respectively, p = 0.66; N = 29 and 19 dams, respectively; S8 Fig.). We conclude that embryonic lethality alone is insufficient to explain TRD at R2d2.
Although embryonic lethality can change the proportion of progeny inheriting alternative alleles at R2d2, only meiotic drive can lead to an increase in the absolute number of progeny inheriting the R2d2 WSB allele per litter in dams with TRD compared to dams with Mendelian segregation. To test whether meiotic drive was responsible for TRD, we determined the average absolute number of offspring per litter that inherited the R2d2 WSB and R2d2 NotWSB alleles in the progenies of the DO-G13 and DO-G16 DO dams with either TRD or Mendelian cross. Female DO-G13-44, mother of the G3 dams phenotyped for TR, is segregating for a copy-number variant at R2d2. G3 dams inheriting the maternal WSB/EiJ haplotype associated with the high-copy allele (R2d2 WSB ) are colored black; those inheriting the WSB/EiJ haplotype associated with the low-copy allele (R2d2 WSBdel1 ) are colored red. Genotypes at marker chr2:85.65Mbp is denoted -/-(homozygous non-WSB), +/-(heterozygous WSB/EiJ) or +/+ (homozygous WSB/EiJ). ΔC t , normalized cycle threshold by TaqMan qPCR assay; TR, transmission ratio, denoted as count of progeny inheriting a WSB/EiJ allele: count of progeny not inheriting a WSB allele; the paternal haplotype at chr2:83. segregation. In dams with Mendelian segregation, the average numbers of offspring per litter that inherited either allele were not different (3.80 R2d2 WSB versus 3.96 R2d2 NotWSB , p = 0.73 in DO-G13 dams; 4.13 R2d2 WSB versus 4.03 R2d2 NotWS , p = 0.29 in DO-G16 dams; Fig. 5 A, B). In contrast, in the progenies of dams with TRD the average number of offspring per litter that inherited the R2d2 WSB allele (4.51 and 4.89 in the DO-G13 and DO-G16 dams, respectively) was significantly greater than the absolute number of either allele in the offspring of dams without distortion (p = 0.006 and 0.049 for the R2d2 WSB and R2d2 NotWSB alleles in DO-G13; p = 0.005 and 4x10 -4 for the R2d2 WSB and R2d2 NotWSB alleles in DO-G16; Fig. 5 A, B). The same result holds true for live embryos at mid-gestation: the average numbers of offspring that inherited R2d2 WSB and R2d2 NotWSB alleles were 5.0 ± 2.2 and 1.6 ± 1.8 for dams with TRD versus 4.3 ± 1.6 and 3.4 ± 1.8 for dams without TRD. Based on the consistent and significant excess average absolute number of R2d2 WSB alleles in the litters of dams with TRD, we conclude again that meiotic drive is required to explain TRD at R2d2.
Further support for meiotic drive was provided by the analysis of the DO-G13-44 pedigree (Fig. 5 C) and crosses between (NZO/HILtJxWSB/EiJ)F1 dams and FVB/NJ sires (cross 15 in Table 1; Fig. 5 D). The average litter size of DO-G13-44 G3 dams inheriting the mutant R2d2 WSB allele (R2d2 WSBdel1 ) was larger than in dams inheriting the standard R2d2 WSB allele (9.4 ± 2.9 and 6.8 ± 1.6, respectively), but the observed average litter size in dams with TRD is significantly greater than predicted based on TR (p = 0.02; S7 Fig.). Similarly, in the (NZO/ HILtJxWSB/EiJ)F1 crosses the average litter size (7.7 ± 2.4; Fig. 5 D) was comparable to DO-G13 and DO-G16 dams without TRD, and was greater than predicted based on TR (p = 0.09; Fig. 5). There was little direct evidence of embryonic lethality at mid-gestation (1.8 ± 1.6 and 0.4 ± 0.5 resorbed embryos, respectively; S8 Fig.). Furthermore, DO-G13-44 G3 dams with different R2d2 alleles differed significantly in the average absolute number of offspring per litter inheriting the R2d2 WSB allele (in dams with TRD) compared to the R2d2 WSBdel1 allele (in dams with Mendelian segregation; 5.3 ± 2.0 and 4.64 ± 2.4, respectively, p = 0.07; Fig. 5 C). Similar results are observed when comparing the absolute number of offspring per litter that inherited the R2d2 WSB allele in the (NZO/HILtJxWSB/EiJ)F1 crosses to the DO dams without TRD (5.1 ± 1.0 and 4.1 ± 1.1, respectively, p = 0.03; Fig. 5 D). In summary, all data from four independent experimental populations were consistent with an explanation of Chr 2 TRD that requires the joint presence meiotic drive and low-level embryonic lethality.

A large copy number variant causes maternal TRD and reduces the average litter size in heterozygous dams
After demonstrating that TRD occurs only through the germline of F1 female mice, we were faced with two major obstacles in our efforts to map the causative locus. First, although heterozygosity for the WSB/EiJ allele is required, it is not sufficient for meiotic drive (Table 1; S1  Table). Therefore, we initially mapped the responder by determining the minimum region of overlap for the WSB/EiJ haplotype only in dams with TRD ( Fig. 1). This yielded a 9.3 Mb candidate interval. Second, the candidate interval spans a recombination-cold region [37,40,42], and the frequency of recombination is three-fold lower than expected in the CC (Fig. 2 D). Although this likely contributes to the overall deficit in recombinant chromosomes (none observed versus an expected 23 in the 378 DO females and 4 in 61 CC lines), the complete lack of recombinants involving the WSB/EiJ haplotype is striking, and, for the purposes of this study, a major impediment to the precise mapping the responder.
Within the candidate interval, a single variant (R2d2) stands out as the most likely cause of TRD. R2d2 consists of one or more copies of a 127 kb sequence (R2d). High copy number (! 24) is present in all four strains with reported TRD and low copy number ( 2) is present in all eight strains without TRD (Fig. 2 A, C). The expansion in copy number leads to an increase of at least 3 Mb in DNA content within the allele favored by maternal TRD. Among CC founders, only WSB/EiJ has a high copy number allele.
As the reference genome is based on a single classical inbred strain, C57BL/6J, copy number gains in other strains or wild mice may be located in a different physical location. Fortunately, the presence of a third allele in CAST/EiJ (which exhibited a twofold enrichment of sequencing reads) combined with the fact that recombinations involving the CAST/EiJ haplotype are not suppressed within the 9.3 Mb candidate interval, enabled us to map the physical location of R2d2 to a 900 kb region located 6 Mb distal to R2d1, the locus where the sequencing reads mapped in the reference genome (Fig. 3). Importantly, the mapping of R2d2 was enabled by the availability of deep sequence data for each of the strains used in our experiments [25-27,37,42,43 and this study] and by combining the results of experiments completed 20 years apart [25][26][27]43,44].
We determined the number and spatial distribution of SNPs in the 9.3 Mb candidate interval that partition the ten inbred strains with whole genome sequence in a pattern consistent with the TRD phenotype (three strains with TRD: WSB/EiJ, SPRET/EiJ and HR8; and seven strains without TRD: A/J, C57BL6/J, 129S1/SvImJ, NOD/ShiLtJ, NZO/HILtJ, CAST/EiJ and PWK/PhJ). Compared to a genome-wide mean of 1 consistent SNP every~3.2 kb, within the 900 kb region where we mapped R2d2 there was a mean of 1 consistent SNP every 883 bp (p < 1.0x10 -4 , one-sided Student's t-test; Fig. 2 E). This reduction in diversity is not due to undercalling of SNPs in the R2d2 candidate interval (Fig. 2 F). The fact that consistent SNPs are rare in most of the genome but are common within the 900 Kb region in which R2d2 maps supports the hypothesis that R2d2 is the causative allele for TRD.
Most importantly, we identified a DO female (DO-G13-44) that was homozygous for the WSB/EiJ haplotype across the entire R2d candidate interval but was heterozygous for R2d2 alleles with different copy numbers (Fig. 4). We generated a three-generation pedigree and analyzed the R2d copy number, the Chr 2 haplotype and TR in the progeny of heterozygous dams with different copy numbers. This analysis revealed perfect correlations between the inheritance of R2d2 WSBdel1 and complete absence of TRD in favor of the WSB/EiJ allele, and between the inheritance of R2d2 WSB and presence of TRD. This experiment demonstrates that the reduction in copy number from 33 to 11 is sufficient to restore Mendelian segregation, and that R2d2 is the causative allele for maternal TRD.
Further evidence that TRD requires an R2d2 allele with copy number of above 11 is provided by the NU/J inbred strain. This strain has intermediate copy number (7, estimated by Taq-Man) but no TRD in the progeny of (NU/JxC57BL/6J)F1 female hybrids (0.55, p = 0.55; S12 Fig.).

Gene content and sequence composition of R2d
The presence of R2d sequences at two distinct locations (Fig. 2 G) indicates an initial duplication of this segment in the ancestor of CAST/Eij, WSB/EiJ, SPRET/EiJ and Hsd:ICR. R2d spans a highly expressed protein coding gene (Cwc22; Fig. 2 C) that is implicated in RNA splicing [38,44], a predicted gene of unknown function that overlaps with the last exon of Cwc22 (Gm13727) and a pseudogene (Gm13726). DNA copy number variation for Cwc22 has been described previously [38,45]. Cwc22 is highly expressed in mouse oocytes and fertilized eggs [45,46]. The Cwc22 gene is a known eQTL in mouse: allele-specific RNA-seq of brain tissue from reciprocal crosses between WSB/EiJ, PWK/PhJ and CAST/EiJ showed extreme differential expression, with the WSB/EiJ allele more highly expressed than the other two [46,47].
Apart from its size and repetitive nature, an important feature of the R2d2 locus is its remarkable uniformity between three divergent genetic backgrounds that are separated by~1 million years of evolution: WSB/EiJ, SPRET/EiJ and HR8 [47][48][49]. For example in WSB/EiJ and SPRET/EiJ the genome-wide mean is 1 SNP every~60 bp [37] and the mean SNP frequency within R2d is significantly reduced to 1 SNP every 1,342 bp (t-test, p = 3.9x10 -58 ). Further analysis will be required to determine the respective ages of the duplication and the copy number change(s), and whether interspecific introgression [48][49][50][51] is required to explain the unlikely degree of sequence conservation between M. m. domesticus and M. spretus.
We note that, while unlikely given the results of our QTL mapping (Fig. 3), it is possible that there have been additional duplication events that have also inserted R2d in other chromosomes. Additionally, the causal allele may incorporate additional DNA sequences, including some that may be absent in the reference genome (similar to the origin of the sequence on maize chromosome Ab10 that causes meiotic drive in that species). If that is the case, the causal allele may be much larger than 4.4 Mb. For example, HSR alleles as large as 200 Mb have been described [50][51][52].
How do meiotic drive and embryonic lethality contribute to TRD at R2d?
A second focus of our study was to discriminate among the many mechanisms [29,52] that could give rise to TRD at R2d2, and to rule out as many as possible. First, the fact that TRD is only observed through the maternal germline rules out both spermatogenesis-mediated processes and sperm competition. Second, the presence of TRD at birth rules out differential survival of offspring. Third, the fact that distortion was independent of the maternal granddam precludes cytoplasmic effects. The remaining plausible explanations are differential fertilization based on the oocyte genotype, embryonic lethality and/or meiotic drive. The first two mechanisms should reduce the average litter size proportionally to TR (black line in S7 Fig.), while the average absolute number of offspring inheriting the favored genotype (R2d2 WSB ) per litter remains constant. The number of resorbed embryos observed in pregnant females could distinguish the two mechanisms because it should be greater in the second than in the first scenario. In contrast, if meiotic drive is solely responsible for TRD then the following should be true: 1) average litter size is independent of TRD, 2) the average absolute number of offspring inheriting the favored genotype (R2d2 WSB ) per litter is higher in dams with TRD than in dams with Mendelian segregation, and 3) the level of embryonic lethality is independent of the presence and level of distortion. The data shown in the Results section are most consistent with the combined action of embryonic lethality and meiotic drive. Specifically, meiotic drive is required to explain both the fact that the observed average litter size in the DO-G13 and DO-G16 dams, in the DO-G13-44 pedigree and in the (NZO/HILtJxWSB/EiJ)F1 dams is greater than predicted based on TR (S7 Fig.), and that the average absolute number of offspring inheriting the R2d2 WSB genotype per litter is greater in dams with TRD (Fig. 5). Note that some p-values in comparisons involving (NZO/HILtJxWSB/EiJ)F1 crosses failed to reach statistical significance due to the small sample size, but the trends were always consistent with those in DO dams with TRD.
An alternative explanation that does not involve meiotic drive would require the combined presence of increased ovulation in dams with TRD and pre-or post-implantation genotype-dependent competition between embryos favoring the allele with the high copy number at R2d2. Genotyping at R2d2 and re-analysis of 159 F2 females from the M16ixL6 intercross [29] confirms an overdominant effect of the R2d2 genotype in the number of live and dead embryos at day 16 of gestation, as predicted under the meiotic drive and embryo competition scenarios, but shows no effect of the R2d2 locus on ovulation rates (S9 Fig.). This result is not due to a lack of power, as we have 80% power (at a = 0.5) to detect a difference in the mean ovulation rate du to an effect of the R2d2 genotype and QTLs for ovulation rate were identified in the original study [29,[53][54][55]. In summary, the effect of the R2d2 genotype on reproductive phenotypes is most consistent with the meiotic drive hypothesis. However, the possibility remains that the genotype-associated difference in number of live embryos may be due to differential fertilization or implantation. Additional breeding experiments and genotyping of pre-implantation embryos will resolve the remaining questions concerning the mechanisms involved in TRD at R2d2.
It is interesting to speculate about the types of embryonic lethality that are consistent with our data and with previous reports of TRD on Chr 2. Lethality is associated with distortion at R2d2, and thus the simplest explanation is preferential death of embryos inheriting maternal R2d2 NotWSB alleles. However, such a scenario would require parent-of-origin-dependent death of embryos with maternal C57BL/6J, 129S1/SvImJ, NOD/ShiLtJ and NZO/HILtJ R2d2 alleles in crosses involving F1 females (Table 1) and CAST/EiJ, PWK/PhJ and A/J R2d2 alleles in the CC/DO females (S10 Fig.). The lack of evidence of TRD and parent-of-origin lethality in dozens of crosses involving these alleles [53][54][55], combined with the lack of evidence for imprinted genes in the central region of Chr 2 [2][3][4]46,56], appears to rule out this explanation. Specifically, the Cwc22 gene present in R2d is not imprinted in brain, kidney, lung and liver in crosses involving the WSB/EiJ, PWK/PhJ and CAST/EiJ strains [46]. A more likely explanation for the joint and correlated presence of meiotic drive and lethality is that the unequal segregation of chromosomes and/or chromatids that leads to TRD in euploid embryos may also lead to increased Chr 2 aneuploidy, and thus to embryonic death (all autosomal aneuploidy is embryonic-lethal in the mouse). This would also explain the slight increase in the number of resorbed embryos observed at mid-gestation (S8 Fig.; S1 Table). This hypothesis makes the testable prediction that Chr 2 should be especially affected by aneuploidy in some dams with TRD.
Importantly, co-segregation of a deletion allele of R2d2 and increased litter size in the DO-G13-44 pedigree demonstrates that lethality is mediated by an element within the R2d repeat.

Maternal TRD at R2d is an oligogenic trait
Overall, we assessed TR at R2d2 in hundreds of females carrying a single WSB/EiJ allele in at least nine distinct genetic backgrounds (Table 1; S1 Table). The presence of significantly different TR levels among F1 hybrid dams, combined with the fact that we observe both extreme TRD and no distortion in the progeny of females with A/J, C57BL/6J, 129S1/SvImJ, NOD/ ShiLtJ, CAST/EiJ and PWK/PhJ alleles in trans at R2d2 (S10 Fig.), demonstrates that TRD is under genetic control of at least one additional locus (i.e., there is at least one unlinked distorter locus that is genetically variable in the CC and DO mice). Furthermore, the presence of at least two significantly different levels of distortion among F1 hybrid dams (Table 1; S3 Fig.) indicates either that more than one distorter locus is involved or that an allelic series exists at a single distorter locus.
Further evidence that TRD is under control of one or more unlinked distorters was provided by 15 female DO-G13-44 G1 offspring that inherited the high-copy allele. Those dams had significantly different levels of TRD (p = 9.8x10 -5 ). Note that there was no correlation between the presence or level of TRD and the paternally inherited allele (one-way ANOVA, F = 2.21 on 1 and 23 df, p = 0.15; Fig. 3).
In the DO-G13-44 pedigree, females that inherited the R2d2 WSBdel1 allele had copy number 11 (S11 Fig.), indicating a partial rather than complete deletion of the expansion. Using the TaqMan assay, we identified two additional DO females (DO-G13-49 and DO-G16-107; S4 Fig.; S11 Fig.) that had results consistent with a copy number loss in the WSB/EiJ haplotype. The presence of the deletion in the respective germlines was confirmed by the TaqMan assay in their progenies (S4 Fig.). Importantly, each one of the three deletions appears to be independent because these females are not closely related, their WSB/EiJ haplotypes in Chr 2 are different and the copy number present in each female is also different (S12 Fig.). The deletions appear to be internal to R2d2 based on the analysis of the MegaMUGA genotypes and intensities [57] at all surrounding markers. The repeated observation of independent deletions indicates that R2d2 is rather unstable and may explain the fact that, despite its presence in laboratory strains and wild mice, it has not led (yet) to a complete selective sweep.
Known meiotic drive systems (S1 Fig.) consist of one or more responder loci (a locus subject to preferential segregation during meiosis) and a single distorter (the effector locus required for drive at the responder). In meiotic drive systems that are stable in natural populations, responder and distorter loci are tightly linked and are typically protected from decoupling by factors that inhibit recombination, such as structural variation [7,11,14]. Although R2d2 resides within a recombination-cold region, the distorter is not closely linked to R2d2 based on the TR observed and the diplotypes present in F1 hybrid and DO dams ( Fig. 1; S8 Fig.). Therefore, at least one unlinked distorter is required to explain the observed variability in TRD.
These observations indicate that the maternal TRD phenotype has a complex genetic architecture. Specifically, a minimum number of copies of R2d are required in heterozygosity at R2d2 for TRD to be observed. Therefore, it can be classified as overdominant, restricted to the female germline and caused by structural variation. Similar characteristics have been recently reported for the Xce locus that controls X-inactivation choice; notably, characterization of Xce relied on the analysis of a genetically diverse set of F1 hybrid mice [58]. In addition, multiple alleles at unlinked loci interact to determine whether distortion occurs at R2d2, and to what extent. This is unique among meiotic drive systems (S1 Fig.) and has important implications for the natural history of the system and for the ease of genetic dissection. We hypothesize that variation in TR levels at R2d2 results from the interaction of alleles originating from multiple taxa, and thus the use of inter-specifc and inter-subspecific mouse populations was key to the characterization of this system. Wild-derived strains and wild-caught mice have enabled important biological discoveries [4,59], and we echo previous encouragements of a more prominent role for these resources in biological and biomedical research [60,61].
What is the mechanism by which R2d2 WSB influences its own segregation?
Centromeres (i.e., the site of kinetochore formation) are remarkable loci that control, in cis, proper segregation of chromosomes during mitosis and meiosis. It is easy to envision how a responder at, or tightly linked to, a centromere can influence chromosome segregation. Recent evidence shows that kinetochore protein levels and microtubule binding are positively correlated with preferential segregation to the oocyte in mice that are heterozygous for Robertsonian fusions [62], indicating that differences in centromere "strength" lead to meiotic drive. Responders located far away from centromeres are thought to influence their own segregation in cis by becoming "neocentromeres" and taking advantage of the inherited functional polarity of the female meiotic spindle [63]. We hypothesize that R2d2 may act as a neocentromere after epigenetic activation mediated by C57BL/6J, NZO/ShiLtJ, 129S1/SvImJ, and NOD/HILtJ alleles at the distorter(s).
The discovery of multiple R2d2 alleles with different copy numbers demonstrates that the presence of the distal insertion of R2d is not sufficient for meiotic drive; rather, some minimum copy number (> 11) is required for TRD. This raises the possibility that meiotic drive at R2d2 is dosage-dependent, such that fine-scale control over the level of TRD is possible by adjusting the number of copies of R2d. If R2d2 is acting as a neocentromere, this may also indicate that some minimum size and/or number of repeats is required for recognition and activation by the epigenetic machinery. The Ab10 system of maize provides examples of responders that function as neocentromeres and for which the level of meiotic drive depends on the size of the responder (i.e., knob size) [11].
The effect on the Chr 2 centromere of activating an ectopic neocentromere at R2d2 is unknown, but it might explain the moderate levels of lethality caused by aneuploidy and suggests that some coordination between the two loci is required to achieve chromosome segregation. Meiosis involving chromosomes with neocentromeres may lead to an increased rate of nondisjunction and a reduced rate of recombination.

Implications of R2d2 for the CC and DO
The conclusion that a genetically complex meiotic drive system is responsible for TRD favoring the WSB/EiJ allele at R2d2 is fully consistent with the initial observations of TRD in the CC, with our prediction that positive selection of the WSB/EiJ allele occurred during outcrossing or in early inbreeding generations [35], with the presence of similar levels of TRD in extinct and extant CC lines at intermediate generations of the CC (S5 Table) and with the fact that C57BL/ 6J, 129S1/SvImJ, NOD/ShiLtJ and NZO/HILtJ haplotypes at R2d2 are not underrepresented among the currently completed CC strains (http://csbio.unc.edu/CCstatus/index.py). The observed levels of TRD in crosses that use DO females are consistent with presence of different alleles at the distorter(s) (S7 Fig.; S1 Table).
Although the discovery and identification of TRD that emerged from the DO pseudo-randomized mating scheme offered the opportunity to characterize a novel meiotic drive responder, the existence of such a locus could negatively impact the utility of this population for genetic studies. Fortunately, the locus was discovered before complete fixation of the R2d2 WSB allele. Although the candidate interval spans 900 kb, TRD affects a much larger region in the DO because the strength of selection in favor of the WSB/EiJ allele is outpacing the rate at which recombination can degrade linkage disequilibrium in the region. Ultimately, this region would become an actual or statistical 'blind-spot' in the DO, such that the non-WSB/EiJ allele frequencies would become too small to detect allelic effects on phenotypic variation. Efforts are underway to purge the WSB/EiJ allele from the DO breeding population at this locus or to select for mice carrying a WSB/EiJ haplotype with a low copy number for R2d2, rather than allow the region to become fixed. Using marker-assisted selection, progeny of heterozygous WSB/EiJ carrier crosses are excluded from subsequent generations. Allele frequencies and random segregation on all other chromosomes are being preserved (EJC unpublished).

Concluding remarks
The SPRET/EiJ and WSB/EiJ strains and the Hsd:ICR outbred stocks are among the most extensively characterized and utilized mouse populations. Resources involving those populations include whole-genome sequencing and genotyping [24,37], development of linkage maps of the mouse [40,64,65], creation of genetic reference populations [35,36,66], experimental crosses to map a diverse collection of biomedical and evolutionary traits [33,48,53,61] and selection lines derived from Hsd:ICR (such as M16i and HR) that have been widely used for genetic analyses [30][31][32][67][68][69]. The potential for distorted allele frequencies in crosses involving those populations may affect the interpretation of results from a wide range of genetic, behavioral and physiological studies.
The R2d2 system has attributes that make its genetic and mechanistic characterization a tractable problem. Identification of several distorters would allow assembling the pathway(s) responsible for centromere function and spindle polarity. This may open the way to explore at the molecular and mechanistic levels an evolutionary force (meiotic drive) thought to be responsible for karyotype evolution in mammals and in many other organisms [15]. With the advent of genome engineering tools such as CRISPR/Cas9 [70], we also anticipate practical applications of a strong, modulable meiotic drive system with only modest levels of lethality. For example, meiotic drive could be used to increase the efficacy of gene drives for introducing new genes into experimental or natural populations [71].

New mouse crosses
Crosses 1-2, 7-10 and 16-17 (Table 1). WSB/EiJ and C57BL/6J were used in reciprocal combinations. Male F1 hybrids were backcrossed to C57BL/6J to produce the progeny of crosses 1 and 2. Female F1 hybrids were backcrossed to C57BL/6J to produce the progeny of crosses 16 and 17. The progeny of crosses 7-10 was produced in a similar way to crosses 16 and 17, except that female F1 of reciprocal matings of WSB/EiJ and CAST/EiJ were used for crosses 7 and 8, and female F1 of reciprocal matings of WSB/EiJ and PWD/PhJ were used for crosses 9 and 10. All breeding was done at the Jackson Laboratory (Bar Harbor, ME).

Linkage mapping of TRD in DO-G13-44xCC cross
A single G13 DO female (DO-G13-44) was mated to a male that was the result of an intercross between four CC lines (CC013/GeniUnc, CC053/Unc, CC065/Unc and CC008Geni/Unc; Fig. 4). G3 female progeny were weaned, single housed and mated to FVB/NJ males. Cages were surveyed three to five times per week. Litter sizes were recorded and G4 pups were sacrificed at birth, and tissue was collected for DNA isolation.
TR was measured in G3 dams as described above. Each dam was classified as having TRD (p < 0.05 for 1-df Χ 2 test of null hypothesis TR = 0.5) or not having TRD (p ! 0.05). Both G2 parents and G3 dams were genotyped on MegaMUGA and phased haplotypes at R2d2 were inferred by manual inspection of haplotype reconstructions. In order to isolate the contribution of maternal and paternal alleles to TRD, MegaMUGA markers called as H in the G2 dam and homozygous in the G2 sire were retained for mapping, and presence of TRD was mapped as a binary phenotype using a logistic regression analog to the Haley-Knott method. The procedure was repeated using only markers called as H in the father of the G3 dams and homozygous in the mother. Significance thresholds for LOD scores were obtained by unrestricted permutation.
All other samples. DNA for PCR-based genotyping was performed on crude whole genomic DNA extracted by heating tissue in 100ul of 25mM NaOH/0.2mM EDTA at 95°C for 60 minutes followed by the addition of 100ul of 40mM Tris-HCl. The samples were then spun at 2000 rpm for 10 minutes and the supernatant collected for use as PCR template. All primers (S6 Table) used in this study were designed using PrimerQuest software (https://www.idtdna.com/Primerquest). PCR reactions contained 1.5-2 mM MgCl2, 0.2-0.25 mM dNTPs, 0.2-1.8 μM of each primer and 0.5-1 units of GoTaq polymerase (Promega) in a final volume of 10-50 μL. Cycling conditions were 95°C, 2 min, 35 cycles at 95°, 55°and 72°C for 30 sec each, with a final extension at 72°C, 7 min. PCR products were loaded into a 2% agarose gel and run at 200 V for 40-120 minutes (depending on the marker). Genotypes were scored and recorded.
DNA for MegaMUGA genotyping was isolated as described previously [40,53]. Briefly,~2 mm of mouse tail (5 mg) was harvested, flash-frozen on dry ice and digested with proteinase K overnight at 65°C. The following day, DNA was extracted using the QIAGEN Puregene Gentra kit (kit no. 158389; QIAGEN GmbH, Hilden Germany). Genotyping was performed with the MegaMUGA genotyping microarray (Neogen/GeneSeek, Lincoln, NE), a 78,000-probe array based on the Illumina Infinium platform.
Genotyping by TaqMan. After R2d2 was established as the causal variant for TRD, a subset of DO-G16 progeny and all (M16i x L6)F2 intercross progeny were genotyped using TaqMan real-time PCR assays for Cwc22. Samples heterozygous for a high-copy allele at R2d2 can be readily distinguished from samples homozygous for a low-copy allele based on the normalized cycle threshold value estimated from the assay (see section "Copy-number validation" below).

Statistics
Deviation from Mendelian transmission. TR is reported as the ratio of the WSB/EiJ genotype to the total number of genotypes: WSB / (WSB + nonWSB). P values for aggregate data were calculated using a Χ 2 goodness-of-fit test of the observed number of WSB/EiJ genotypes compared to the number of WSB/EiJ genotypes expected under the null hypothesis of equal transmission: For individual dams, the small sample sizes (typically fewer than 50 total offspring) would lead to type II error; therefore, p-values were calculated using an exact binomial test. Confidence intervals for TRs were calculated using the binom R package (http://cran.r-project.org/ web/packages/binom/).
Average litter size. Average litter size was calculated as the mean number of offspring counted soon after birth per litter per dam (± standard deviation), including the number of viable embryos counted in utero in mid-gestation DO dams (unless otherwise noted).
The expected average litter size (ALS) of a dam under a model in which lethality is the sole explanation for TRD is: where ALS Exp is the mean ALS in dams with no TRD [41]. Significance of the deviation of ALS Obs from ALS Exp was determined using a Wilcox signed rank test.
Inheritance of R2d2 alleles. Similarly, the average absolute number of offspring inheriting each R2d2 allele was calculated as the mean number of offspring per litter per dam having each of the possible genotypes. Significance was determined using a one-tailed Student t-test.

Estimation of embryonic lethality
DO and F1 dams were euthanized by CO 2 asphyxiation 12-18 days after delivery of the previous litter and the uterus was dissected. The number of live embryos and reabsorbed (dead) embryos was recorded. Each live embryo was dissected to isolate DNA for genotyping. Tissue from each live embryo was harvested for DNA extraction and genotyping.

Analysis of genotyping arrays
All MDA arrays were genotyped using MouseDivGeno [57], and all MegaMUGA arrays were genotyped using Illumina BeadStudio. We plotted number of H and N calls (as a fraction of the total number of genotypes) for each group of similar samples and excluded outliers from further analysis. For CC lines, DO animals, CCxCC F1 females and DOxCC F1 females, we inferred haplotypes using probabilistic methods [40,75]. As an additional QC step, we grouped DO samples by generation and plotted the number of recombinations (counted as unique transitions in haplotype reconstructions) and removed outliers.

Linkage mapping of R2d2
CAST/EiJ allele in the CC G2:F1. Thirty-four MDA SNP probe sets were identified within R2d in the GRCm38 reference sequence (S3 Table). We ensured that these probes were unique using BLAT [76] to map them to the reference genome. In order to map the expansion allele present in the CAST/EiJ strain, phenotypes and genotypes were coded as follows. First, we applied a CCS transform [77] to the mean intensity of all probes in each probe set using Mouse-DivGeno [57] and summed the values for each sample to obtain the final phenotype value. Next, the genome was divided into a set of disjoint intervals whose boundaries were defined by the 21,933 unique recombination events inferred in the population [40], so that no individual would be recombinant within any of the resulting intervals. Then, using haplotype reconstructions, individuals were coded as either heterozygous (CAST/not-CAST) or homozygous (not-CAST/not-CAST) within each interval (there are no CAST homozygous individuals in this population). Of 474 individuals, 144 with a WSB/EiJ allele in the middle of chromosome 2 were excluded to yield a final sample size of 330. A single-locus QTL scan was then performed via Haley-Knott regression [78], treating the population as a backcross.
WSB/EiJ allele in an intercross population. Three MegaMUGA SNP probes were identified within R2d in the GRCm38 reference (S4 Table). Again, uniqueness was verified using BLAT. In order to map the expansion allele in WSB/EiJ, the sum intensity of these probes was used as a phenotype and genotypes were coded as follows. First the genome was divided into a grid of 1,000 disjoint intervals of approximately equal size, and one MegaMUGA SNP marker segregating between WSB/EiJ and PWK/PhJ was selected per interval. Individuals were coded as heterozygous (WSB/not-WSB) or homozygous (not-WSB/not-WSB) at each marker. A single-locus QTL scan was then performed using Haley-Knott regression as implemented in R/qtl [79], treating the population as a backcross.

Fine-mapping of R2d2
In order to refine the location of R2d2, we identified individual mice with recombinant chromosomes within the candidate interval defined by linkage mapping. These critical recombinants define the proximal and distal boundaries of the refined candidate interval.
CAST/EiJ allele. We partitioned the 330 G2:F1 individuals without a WSB/EiJ allele in the R2d locus into two groups according to MDA sum-intensity values. From those with sum-intensity consistent with a non-CAST/EiJ expansion allele, we selected the most distal recombinants from CAST/EiJ to another haplotype. From those with sum-intensity consistent with the CAST/EiJ expansion allele, we selected the most distal recombinant from another haplotype to CAST/EiJ. Together these recombinants define the proximal boundary of the candidate interval in CAST/EiJ. Similarly, in order to define the distal boundary of the candidate interval, we selected the most proximal recombinants from CAST/EiJ to another haplotype that still had sum-intensity consistent with the CAST/EiJ expansion allele.
WSB/EiJ allele. The boundaries of the WSB/EiJ candidate interval were mapped in the same fashion using 229 individuals spanning generations 10 through 14 of the DO, all of which have been genotyped on MegaMUGA and are recombinant for WSB/EiJ in the initial candidate interval. We first excluded individuals homozygous for WSB/EiJ over any interval with in the interval. Then we selected the most distal recombinants from another haplotype to WSB/EiJ, which also had MegaMUGA sum-intensity values consistent with a non-WSB/EiJ expansion allele. These recombinants define the distal boundary of the candidate interval. We mapped the proximal boundary similarly.
SPRET/EiJ allele. (C57BL/6JxSPRET/EiJ)xC57BL/6J (n = 12) and (A/JxSPRET/EiJ)xA/J progeny (n = 17) [26,43] genotyped on the MegaMUGA array were used to refine the candidate interval for the expansion allele in SPRET/EiJ. Haplotypes in the relevant region of Chr 2 were inferred by manual inspection of genotype calls. Samples were partitioned according to sum-intensity at the three MegaMUGA SNP probes tracking the expansion allele. Among individuals with sum-intensity consistent with the expansion allele, the most proximal recombinant from SPRET/EiJ to another haplotype defines the distal boundary of the candidate interval. Likewise the most distal recombinant from a non-SPRET/EiJ haplotype to SPRET/EiJ defines the proximal boundary of the candidate interval.

Whole-genome sequencing
Ten individuals from the HR8 selection line were selected for whole-genome sequencing. Five micrograms of high-molecular-weight DNA were used to construct TruSeq Illumina libraries, using 0.5 μg starting material, with 300-to 400-and 400-to 500-bp fragment sizes. Each library was sequenced on one lane of an Illumina HiSeq2000 flowcell, as paired-end reads, with 100bp read lengths. We aligned the sequences to the University of California at Santa Cruz Mouse Build mm9. HR8 sequenced reads were aligned to the mouse genome (mm9) using bowtie 2.2.3 [80] with default options. We removed PCR duplicates and filtered low-quality SNPs using samtools 0.1.19 [81] and Picard 1.88 (http://picard.sourceforge.net/).

Sequence variants and read depth
We retrieved BAM files of aligned reads (Oct 2012 release) from the Sanger Mouse Genomes Project FTP site (ftp-mouse.sanger.ac.uk). We used the mpileup function of samtools [81] to call sequence variants on the HR8 and Sanger BAM files jointly and to output the read depth at each base. We counted a SNP as private to WSB/EiJ, SPRET/EiJ and the 10 HR8 individuals if those samples all shared a genotype that was different from the seven other CC founder strains. We defined the boundaries of the copy number expansion by identifying consecutive 100bp windows in which the average read depth was at least twice the genome-wide average read depth. We estimated the number of copies of the expansion as the modal per-base read depth.

Copy number validation
We used commercially-available TaqMan assays for Cwc22 to estimate the copy number of R2d2. We used two copy number assays (Life Technologies catalog numbers Mm00644079_cn, Mm00053048_cn) to target the number of Cwc22 copies (proximal and distal). We also used two reference assays (Tfrc, cat. no. 4458366, for target Mm00053048_cn; Tert, cat. no. 4458368, for target Mm00644079_cn), for genes known to exist in a single haploid copy in the mouse, to calibrate the amplification curve. Assays were performed according to the manufacturer's protocol on an ABI StepOne Plus Real-Time PCR System (Life Technologies, Carlsbad, CA). Cycle thresholds (Ct) for each assay were determined using the ABI CopyCaller v2.0 software with default settings. For each target-reference pair, relative cycle threshold (ΔCt) was calculated as The ΔCt value is proportional to copy-number of the target gene on the log scale but is subject to batch effects. In order to account such effects, normalized ΔCt values for each sample were calculated as follows. A standard set of control samples (from C57BL/6J, WSB/EiJ, CAST/ EiJ and (WSB/EiJxC57BL/6J)F1 mice), spanning the expected copy-number range for Cwc22, were included in duplicate or triplicate in every assay batch. A linear mixed model was fit to raw ΔCt values for these control samples, with target-reference pair and batch as random effects, using the lme4 package (http://lme4.r-forge.r-project.org/) for R (http://www.R-project. org/). Predicted values (best linear unbiased predictors, BLUPs) from this model capture technical variation orthogonal to variation due to genotype. BLUPs calculated from control samples were subtracted from raw ΔCt values for all samples, and the residual was used as the normalized ΔCt for copy-number estimation.
In this manuscript we chose in most cases to present ΔCt, rather than extrapolated absolute copy number, because ΔCt is the natural scale of the data (i.e., the log scale). Constant variance (with respect to mean) on the log scale grows exponentially on the linear scale so that estimates of absolute copy number become increasingly uncertain as copy-number grows.

Linkage mapping of Cwc22 TaqMan assay
The use of TaqMan assays for Cwc22 as a proxy for copy number at R2d2 was validated by mapping normalized ΔCt for target Mm00644079_cn as a quantitative phenotype in 64 members of the (FVB/NJx(WSB/EiJxPWK/PhJ)F1)G2 intercross population described above. The marker selection and mapping procedure were the same as described above for mapping Mega-MUGA sum-intensity values.

Availability of data
Chr 2 genotypes and whole-genome sequence that have not been published elsewhere are available at http://csbio.unc.edu/r2d2.  Table 1 (red circles). Boxplots show the ranges of TRs observed in four sets of crosses (numbered according to Table 1): heterozygous sires (1-6) and heterozygous dams with no TRD (7-10), intermediate TRD (11)(12)(13)(14)(15) and high TRD (16)(17)(18). The first two classes are not different from the Mendelian expectation of 0.5, nor from each other. The third and fourth classes are significantly different from 0.5, from each other, and from the first two classes. R2d2 copy number in dams tested for TR and in their progenies. Normalized ΔC t , normalized cycle threshold by TaqMan qPCR assay (see Methods). A) Homozygous calibration samples used for TaqMan assays targeting Cwc22: C57BL/6J (dark grey), haploid copy number 1; CAST/EiJ (green), copy number 2; (WSB/EiJxC57BL/6J)F1 (lavender), copy number~17; and WSB/EiJ (purple), copy number~33. In panels B-H, all samples are predicted to be heterozygous for the R2d2 WSB allele based on genotype by PCR at marker Chr2:85.65Mbp. B) F1 hybrids between inbred CC lines used to define the 9.3 Mb candidate interval (see S1 Table). C) G1 hybrids between DO females and CC males used to define R2d candidate interval (see S1 Table). D) Heterozygous DO G13 dams. Outlier sample marked in red is female DO-G13-049, dam of samples in panel G. Sample marked in red and with (Ã) is female DO-G13-044, the dam of samples in panel H. E) Progeny of DO G13 dams according to predicted copy number (CN), based on TaqMan assay of corresponding G13 dam. Red points are progeny of female DO-G13-049. F) G3 progeny of family DO-G13-44 (see Fig. 3), the offspring of female DO-G13-044, according to predicted CN based on haplotypes linked to R2d. G) G4 progeny in family DO-G13-44, according to predicted CN based on TaqMan assay of corresponding G3 dams. Only low-molecular weight (LMW) DNA was available for samples in panels G and I; note that ΔC t values obtained from LMW DNA are not directly comparable to ΔC t values from high-molecular  Table 1: light blue = cross 12, (NOD/ShiLtJxWSB/EiJ)F1; pink = cross 13, (129S1/SvImJxWSB/EiJ)F1; dark blue = cross 15, (NZO/HlLtJxWSB/EiJ)F1. Other shapes show values for individual DO females (identified with "ÃÃ" in S1 Table). Females with a mutant R2d2 WSB allele are excluded. Note that females below the black line have TRs that are too high to be explained solely by lethality given their average litter sizes. (PDF)  Fig.). White Δ indicates location of deletion. Phasing is arbitrary except in DO-G13-044 (G2 dam in family DO-G13-44), whose haplotypes could be phased by manual inspection of offspring genotypes. Copy number at the R2d2 locus for each chromosome (estimated from TaqMan normalized ΔC t values in progeny bearing that chromosome) is indicated at right: first the best estimate of integer copy number, then mean of point estimates across progeny ± 1 standard error. (PDF) S12 Fig. TR and copy number at R2d2 in the progeny of (NU/JxC57BL/6J)F1 and DO-G16 females. Filled points, heterozygous samples; open points, homozygous control samples. Progeny can be clearly divided into two classes (high normalized ΔC t , NU/J or WSB/EiJ allele; low normalized ΔC t , alternate allele), demonstrating that the TaqMan assay is appropriate for genotyping at R2d2. Progeny of additional DO-G13 and DO-G16 samples suspected to carry low-copy alleles are shown for comparison. (PDF) S1 Table. Transmission ratio and litter size in R2d2 WSB/NotWSB heterozygous DO, CCxDO and CCxCC dams. For each dam, the numbers of offspring having each of the two possible genotypes is shown, along with their ratio (TR) and p-value for a one-sided exact binomial test of deviation from the Mendelian expectation of 0.5; the average litter size (ALS) ± standard deviation (ALS.SD); the number of live/resorbed embryos counted in utero at mid-gestation of the final litter; the 95% confidence interval (CI) of the TR; the average number of WSB/EiJ and non-WSB/EiJ alleles per litter; the maximum litter size; the TaqMan ΔC t value, standard deviation, and predicted copy number; and the TRD classification: N = no TRD (TR < 0.6 or p > = 0.1), L = low TRD (TR ! 0.6 and p > 0.05), M = intermediate TRD (p 0.05, 0.6 ! TR < 0.92), H = high TRD (TR ! 0.92); U = unclassified (sample size < 10), X = low copy number due to deletion. Females used in determining the R2d2 candidate interval (Fig. 1) Table. Primers used for PCR genotyping. There are two rows for each amplicon, one for the forward strand primer and one for the reverse strand primer. The reference position of the first based of the primer sequence is given (NCBI/37). (XLSX)