A Glance at Recombination Hotspots in the Domestic Cat

Recombination has essential roles in increasing genetic variability within a population and in ensuring successful meiotic events. The objective of this study is to (i) infer the population-scaled recombination rate (ρ), and (ii) identify and characterize regions of increased recombination rate for the domestic cat, Felis silvestris catus. SNPs (n = 701) were genotyped in twenty-two East Asian feral cats (random bred). The SNPs covered ten different chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) with an average region size of 850 Kb and an average SNP density of 70 SNPs/region. The Bayesian method in the program inferRho was used to infer regional population recombination rates and hotspots localities. The regions exhibited variable population recombination rates and four decisive recombination hotspots were identified on cat chromosome A2, D1, and E2 regions. As a description of the identified hotspots, no correlation was detected between the GC content and the locality of recombination spots, and the hotspots enclosed L2 LINE elements and MIR and tRNA-Lys SINE elements.


Introduction
Recombination is a major source of genetic variation within sexually reproducing organisms, and is necessary for the proper alignment and segregation of homologous chromosomes during meiosis. New combinations of parental alleles across loci are generated via recombination and transmitted to succeeding generations. Lack of recombination may cause failure of meiotic division, or formation of gametes with chromosome number abnormalities (aneuploidy), which are often detrimental.
Localized chromosomal regions with high recombination rates relative to surrounding areas are referred to as recombination "hotspots" [1]. Known hotspots in mice and human are generally 1-2 kb regions of high recombination rates surrounded by regions of low recombination [2]. In humans, recombination hotspots are distributed about every 200 Kb [3] and over 25 000 hotspots have been identified [4]. The first human recombination hotspot was identified using restriction site polymorphisms in β-globin gene cluster [5]. A higher resolution human hotspot was localized using sperm typing via PCR in a region that harbors GC-rich mini-satellite (MS32) as a molecular signature [6].
Several genomic features exhibit correlations with recombination hotspots. The GC content has been found to be positively correlated with recombination hotspots in humans [4], dog [7], pig [8], and chicken [9] but not in mice [10]. Long terminal repeats (LTR), long interspersed elements (LINE), and short interspersed elements (SINE) were also observed to be positively correlated with the locality of recombination hot spots in human [4,11], mice [10] and pigs [8]. DNA motifs (cis) have been identified to be associated with recombination hot spots both in humans [4,12,13] and in other organisms [14]. In addition to the cis elements presented by the motifs, a trans element, PRDM9, was found to be a major determinant of recombination hotspots in human and mice [15,16] but not in dog and related wild relatives [7,17,18]. PRDM9 is thought to bind to a DNA motif via a zinc finger domain, altering the chromatin structure through methylation, and recruiting recombination molecular machinery [19,20].
Coarse-scale recombination in cats has previously been investigated through linkage map analyses using sparsely distributed microsatellite markers [21,22]. The objective of this study is to investigate fine-scale recombination rates, and recombination hotspots, in the domestic cat using population-level data for dense SNPs in ten selected genomic regions on ten different chromosomes.

Samples and genotypes
SNP genotype data of twenty-two feral cats from China were obtained as previously described in [23]. The data were generated using a custom Illumina GoldenGate array that represent ten different cat chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) and were composed of 1536 markers nearly equally distributed across the ten regions. Markers' positions were updated to their location in the 6.2 cat genome assembly (http://genome.ucsc.edu/). To ensure successful inference, three criteria were used to filter the SNPs: (i) SNPs mapped to a single chromosomal location on the most recent genome assembly of cat with 100% sequence match, (ii) SNPs exhibiting a genotype call success rate of ! 80%, and (iii) SNPs possessing a minor allele frequency of ! 0.1. The final dataset, included in this study, is composed of 701 markers distributed over the ten regions (S1 Table). The genomic locations of the regions and related summaries are shown in Table 1 (see also Fig 1 and S2 Fig).
Three SNPs residing in a recombination hotspot on chromosome E2 region (see results) were chosen for genotype validation via sequencing. Primers were designed to flank the SNPs (S2 Table) and PCR was performed using DNA Engine Gradient Cycler (MJ Research, GMI, Ramsey, MN) using the following conditions. For each reaction, 2μl of DNA was used in a 1.5mM magnesium concentration with 1μM primers in a total reaction volume of 20μl. The annealing temperature of all primers was 62°C. The PCR protocol was as follows: initial denaturation at 94°C for 5 min followed by 40 cycles of 94°Cx45 sec, 62°Cx20 sec, 72°Cx30 sec, and a final extension at 72°C for 20 min. The PCR products were purified with ExoSap (USB, Cleveland, OH) per the manufacture's recommendations and directly sequenced using the Big-Dye terminator Sequencing Kit v3.1 (Applied Biosystems, Foster City, CA) as previously implemented [24]. Sequences were verified and aligned using the software sequencer version 4.10 (Gene Codes Corp., Ann Arbor, MI).

Recombination inference
The program inferRho uses the coalescent with recombination model [25][26][27] in a full Bayesian framework to infer recombination rates and hot spots along the chromosome regions [28,29]. The evolutionary relationship of the SNPs is represented by the ancestral recombination graph (ARG), which is an unobserved random variable that is integrated over using Markov Chain Monte Carlo (MCMC). In the variable recombination rate model, the population-scaled recombination rate is ρ i = 4N e c i , where N e is the effective population size, c i is the recombination rate per generation in cM/Mb between marker i and i +1 (i = 1,. . ., k − 1, and k is the total number of SNPs). The total recombination rate (c i ) consists of the background crossing-over rate and recombination hotspots (arising according to a Markov process) [28,29]. As genetic data (SNPs) from a population sample of unrelated individuals only provide information about evolutionary distance, the parameters in coalescent estimators are typically scaled by the effective population size (i.e. θ = 4N e μ, where μ is the substitution rate per site per generation; and ρ = 4N e c). To convert ρ into standard unit (cM/Mb), we need to divide it by the ploidy factor and effective population size (4N e for the nuclear genome of cats), which typically comes from other sources of information.
To make the data analysis computationally tractable, each chromosomal region was divided into blocks, with 20 SNPs per block (e.g., chromosome A1 region with 53 SNPs was divided into 3 blocks, 20 SNPs in the first and second block and 13 SNPs in the third block). Blocks from each chromosomal region are assumed to share the same population size parameter (θ) but have an independent ARG for each block. Preliminary runs were executed to determine the appropriate length and thinning interval of the MCMC chain for each chromosomal dataset. The last half of the MCMC samples (1000 samples) was used to estimate the recombination rates (ρ i ) and to plot them with respect to the SNP marker positions. The mean populationscaled recombination rate per site for each chromosomal region was calculated using the following equation: in which d i is the distance between marker i and i+1, and k is the total number of markers in the region. The numerator is total population-scaled recombination rate of the region, while the denominator is the length of this region in Mb.

Designation of recombination hot spots
We calculated the Bayes factors [30] to locate the positions of hotspots and represent relative odds of a hotspot being present. Each chromosomal region was divided into bins (200 bp per bin) for analysis of posterior samples, which contain the start and end positions of the hotspots, to estimate the probability of having hotspot (p j ) for each bin. The corresponding probability of hotspot for each bin (q j ) from the prior was obtained by running the program without data (e.g., constant likelihood) but maintaining the same sample size and marker positions. The Bayes factor (BF) is defined as the ratio of the posterior and prior odds: The odds ratio measures the proportional change in odds favoring a hotspot in the region (versus the prior odds) that results from including the data (SNPs). The total number of bins depends on the length of the region; thus the range of j is variable among the ten chromosome regions. Recombination hotspots are defined to have at least two consecutive bins with BF ! 100 whereas regions of BF < 100 are denoted as "neutral" bins awaiting further investigation (cf. section 3.2 in [30]).

GC content and genomic elements analyses
The GC content was calculated for the sequences of each bin using the function CG.content of package APE in R [31].
Variation and repeat elements within each of the ten chromosomal regions were downloaded from UCSC genome browser using RepeatMasker for the v6.2 cat genome assembly. Elements were analyzed separately for bins with BF < 100 and BF ! 100 (hotspot bins). The elements within neutral bins (BF < 100) were to provide the general overview of the elements within each chromosomal region.

Samples and genotypes
The dataset is composed of 701 SNPs on ten different cat chromosomal regions with an average of 70 SNPs/region. The chromosome E2 region harbors the highest number of SNPs (n = 90) while chromosome X region has the lowest (n = 37). SNPs are distributed across the regions with an average distance between SNPs of 12 Kb and a range of 145 bp-651 Kb (Table 1). Four SNPs were selected for genotype verification using direct sequencing. The sequencing results were concordant to the genotypes obtained previously using the genotyping platform.

Recombination rates and hotspots
The ten regions on chromosomes: A1, A2, B3, C2, D1, D2, D4, E2, F2, and X, were analyzed using inferRho. The lengths of the MCMC chains were determined by the trace plot and effective sample size (ESS) of the parameters as: 600 000 iterations for A1, C2; 800 000 iterations for A2, D1, D4, F2; 1 000 000 iterations for B3, D2, E2; and 500 000 iterations for X. Two thousand posterior samples for each chromosomal dataset were obtained after thinning, while the first half was discarded as burn-in. Using a parallel computing approach, each run could be accomplished within one to two weeks on a cluster with 2 Opteron 270 (2.0 GHz) processors per node.
Population recombination rate (ρ) is plotted for each region (Fig 1a-1c and S1a Fig). The mean recombination rate (ρ) across all regions is 200 per Mb. The E2 region exhibits the highest mean rate (309 /Mb) whereas X region has the lowest (77 /Mb) ( Table 1). The difference between the background recombination rate and whole recombination rate is indicative of recombination spots. This difference is most noticeable in regions of chromosomes A2, D1, and E2 (Fig 1a-1c).
The posterior probability of hotspot was calculated for bins of size 200 bp in each region (Fig 1d-1f and S1b Fig). Chromosomes A2, D1, and E2 show distinctly high posterior probabilities (> 0.6) in four localized areas (Fig 1d-1f). The posterior probabilities for the other seven chromosome regions (A1, B3, C2, D2, D4, F2, and X) are less than 0.2, indicating little support for elevation of recombination rate in these areas.
The Bayes factors (Fig 1g-1i and S1c Fig) are consistent in pattern with the posterior probabilities (Fig 1d-1f and S1b Fig) and recombination rates (Fig 1a-1c and S1a Fig). Approximately 99% (n = 42,531) of bins were classified as "neutral" (BF < 100) across all regions examined. The hot spots were found in only three chromosomal regions (A2, D1, and E2) and represented~0.13% (n = 57) of all bins studied. Summaries of the numbers and distribution of bins within each chromosomal region are provided in Table 1. The hotspots had size of 3 Kb in A2, 1.8 Kb in D1, and 1.8 Kb for the first and 4.6 Kb for the second in E2. The distance between the two hotspots on E2 was~37.4 Kb ( Table 2).

GC content analysis
The log 10 of the Bayes factor was plotted as a function of the GC content in S2 Fig. Pearson's correlation test revealed a positive correlation between GC and log 10 (Bayes factor) (cor = 0.077, p < 0.0001) but was not suggestive of strong correlation. Moreover, no significant differences in the mean GC content of each class of bins were observed (t-test, p = 0.05) (S2 Fig). The mean GC contents of the hotspots are shown in Table 2.

Repeat elements analysis
The four hotspot regions contained 22 repeat and variation elements (Table 2, Fig 2a). SINE elements constitute the highest proportion (40%) of the elements present in the hotspots followed by LINE elements (27%). Within LINE elements, L2 elements were present in three of the four hotspot regions. MIR family elements were present in all hotspot regions and tRNA-Lys family elements are present in three of the four elements. Low complexity, long terminal repeats, simple repeats, and DNA elements were inconsistently present across the hotspot regions.
Variation and repeat elements within neutral regions where investigated to get a picture of the general distribution of repeat elements (Fig 2b). In neutral regions, the SINE elements represent the highest proportion of elements, 34%. The tRNA-Lys SINE elements constitute 22% and MIR SINE elements represent 12%. The second highest repeat elements was the LINE elements, 29%, where L1 elements represent~17% and L2 elements represent 9.5% of all elements in the neutral regions.

Discussion
Advances in population genetic theory and technology supports the estimation of recombination rates directly from genotype data on population samples, overcoming the limitations of sperm typing or using large extended families [32]. The strategy of the population genetics based approach is to use information on the number of recombination events that have occurred in the history of the population, which can be detected by modeling the patterns of genetic variation expected to be present in randomly selected individuals.
The model accounts for coalescent and recombination events in an Ancestral Recombination Graph (ARG). Markers will have coalescent trees that are likely to vary across the genome. In theory, all markers in a chromosome are correlated by an ARG. However, the size of the ARG may grow much faster than linear with the increasing number of SNPs, making it computational intractable to simultaneously analyze all markers in an analysis. In practice, the data are usually partitioned into blocks. Blocks from the same chromosomal region are assumed to share the same population size parameter (θ) but have an independent ARG for each block. For fast computation, some methods use an approximate likelihood instead of the full likelihood calculation, for example, the composite-likelihood method implemented in LDhat [33] and PAC-likelihood method implemented in PHASE [34]. Approximate-likelihood methods may be feasible to apply to large genomic regions, but may lack power to detect a moderate or low rate of recombination. Full-likelihood methods, such as implemented in inferRho, use all the information in the data and should therefore provide more accurate estimates [28,29].
The analysis of recombination hotspots in cats, presented here, constitutes the first application of the program inferRho to non-human data and the first analysis of fine-scale recombination rates in cats. The population recombination rate was found to be variable between the  regions analyzed and, as expected, the mean recombination rate of X chromosome regions was lower than that of any autosomal regions. This variation in recombination rates and the notable reduced rate outside of the pseudoautosomal region of the X chromosome are in agreement with observations of recombination in human [4] and dog [7]. The latter result is expected due to that fact that recombination outside of the pseudoautosomal region occurs only in females. Four decisive hot spots were identified on three chromosomes: A2, D1 and E2. The localities of the hotspots are in agreement with the localities of increased posterior probabilities and the general topography of recombination rates as expected. Acknowledging the limitation posed by the total size of the regions analyzed (total~8.5 Mb) compared to the size of the genome and the lack of power to perform correlation and element enrichment analyses, the following observations have been made: (i) the cat hotspots, identified in this study, show no distinct positive correlation with GC content. (ii) The four hotspots contain at least one L2 LINE element and MIR and tRNA-Lys SINE elements. (iii) The similarity of the repeat elements in cat hotspots compared to other mammals might suggest similar recombination mechanisms.
This study represents a glimpse of recombination hotspots in cats and only an initial step toward understanding recombination in cats. The markers are sparsely sampled in some of the genomic regions, which may reduce power for inferRho to detect a large number of hotspots. As cat resources develop, genome-wide analyses could be performed allowing more definitive conclusions to be reached. Nonetheless, our preliminary description of the recombination landscape, and our finding that hotspots are present, helps shed light on the mechanism of recombination in cats compared to other species, furthering our understanding of the patterns of variation generated by recombination in cats, and potentially leading to better implementation of efficient disease mapping strategies in cats.