Genome-wide recombination rate variation in a recombination map of cotton

Recombination is crucial for genetic evolution, which not only provides new allele combinations but also influences the biological evolution and efficacy of natural selection. However, recombination variation is not well understood outside of the complex species’ genomes, and it is particularly unclear in Gossypium. Cotton is the most important natural fibre crop and the second largest oil-seed crop. Here, we found that the genetic and physical maps distances did not have a simple linear relationship. Recombination rates were unevenly distributed throughout the cotton genome, which showed marked changes along the chromosome lengths and recombination was completely suppressed in the centromeric regions. Recombination rates significantly varied between A-subgenome (At) (range = 1.60 to 3.26 centimorgan/megabase [cM/Mb]) and D-subgenome (Dt) (range = 2.17 to 4.97 cM/Mb), which explained why the genetic maps of At and Dt are similar but the physical map of Dt is only half that of At. The translocation regions between A02 and A03 and between A04 and A05, and the inversion regions on A10, D10, A07 and D07 indicated relatively high recombination rates in the distal regions of the chromosomes. Recombination rates were positively correlated with the densities of genes, markers and the distance from the centromere, and negatively correlated with transposable elements (TEs). The gene ontology (GO) categories showed that genes in high recombination regions may tend to response to environmental stimuli, and genes in low recombination regions are related to mitosis and meiosis, which suggested that they may provide the primary driving force in adaptive evolution and assure the stability of basic cell cycle in a rapidly changing environment. Global knowledge of recombination rates will facilitate genetics and breeding in cotton.


Introduction
Cotton (Gossypium spp.) is globally the primary nature fibre crop and the largest source of renewable plant-based fibre. In addition to fibre, the cotton seeds have distinctive uses and economic importance and provide a significant source of vegetable oil and high protein meals [1], which makes cotton an important food source for humans and livestock [2]. Recently, the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 genomes of G. arboreum, G. raimondii, G. hirsutum and G. barbadense have successively been sequenced [3][4][5][6][7][8][9], which has advanced the understanding of cotton genomics and genetics. Genetic maps also help us to understand the genetic makeup of the genome and to obtain the localization of genes of interest by analyzing genetic linkage with the mapped markers [10]. More than 30 marker-based genetic maps have been developed in cotton, and most of them are interspecific crosses between G. hirsutum and G. barbadense, including various kinds of markers [11].
Recombination plays a key role in biological evolution, and is central to the evolutionary success within eukaryotes. When crossover recombination occurs, the homologous chromosomes exchange genetic information, resulting in allele shuffling and generating novel genetic variation by breaking the associations between linked genes, but it may homogenize alleles through gene conversion [12][13][14]. Genetic variation arises and is separated by natural selection in genetic backgrounds, which shapes the adaptive evolution of organisms and indirectly shapes genome evolution.
Recently, recombination has been analysed by estimating genetic distance divided by the physical distance using linkage disequilibrium (LD) mapping and linkage mapping with molecular markers [15]. Generally, the classical approach for investigating recombination rate across the genome is to build a high-density genetic map and match it to the corresponding physical map, which can directly estimate the recombination rate between the genetic distance and the corresponding physical distance and is essential to understand the intergenerational variability of the genome [16].
Understanding variation in the recombination rate is not only fundamental to many aspects of genetics but also will help to gain a better comprehending of genome evolution. Knowing the genomic distribution of recombination rate can help to predict the potential quantity and breeding method of the population response to environmental change [15]. Specifically, characterizing the recombination rate will contribute to predict the degree of LD and target marker densities for genomic selection, which shows good promise of genomic selection in the future for both animal and plant breeding [17]. Practically, there are applications in finding genes of interest that are detected by linkage mapping or association genetic studies so that we can introduce favorable alleles via breeding [15,18,19].
Determining recombination rate variation and distribution across the cotton genome has been constrained by the accuracy of existing genetic maps and the reference genome sequence. The genomic resources required to investigate the recombination rate for cotton have only recently become available due to genome sequencing [7,20,21]; thus, there is little knowledge of the genomic distribution of the recombination rate in the cotton genome. Fortunately, this provides us the opportunity to directly estimate the genome-wide recombination rate in cotton using a high-density genetic linkage map. Although a recent study has provided some information about recombination in cotton [20], there is still little information regarding to the genome-wide variation of recombination rate and its correlations with genomic features in cotton.
In our laboratory, a high-density genetic map was constructed that included 5,152 loci; the total length was 4696.03 cM, and the average marker interval was 0.91 cM [21]. Having this linkge map available and knowing the genome sequence of tetraploid cotton allow us to investigate the landscape of genome-wide variation in the recombination rate. In this study, we aimed to (i) explore the differences of recombination rate between two subgenomes, and among different chromosomes; (ii) to reveal the correlations between recombination rate and the genomic features, including density of genes, transposable elements and markers, and the distance from centromere; and (iii) to identify gene ontology enrichments in the hotspot versus coldspot regions.

Marker sequence data
A total of 5,299 primer pairs were used and described in details by Li et al. [21]. The primer set included 4,569 SSRs (http://www.cottongen.org/) and 730 other types of primers [22,23]. Using the Cottongen database [24], the potential intron polymorphism (PIP) database [22] and Cotton Marker database (CMD) (http://cottonmarker.org) [25], the sequences corresponding to these markers were found and downloaded. The CMD database has been superseded by CottonGen database because it is better for data retrieval, visualization, data sharing and data mining in cotton studies [24]. Some markers sequences that were not found via their ID numbers were excluded from the analysis in this study. A total of 4,807 available markerderived sequences from the 5,152 mapped markers were used for downstream analysis.

The relationship between genetic and physical maps
The high-density genetic map was 4696.03 cM in total length and 0.91 cM in mean distance [21]. Herein, to obtain the physical location of the markers, the 4,807 available nucleotide sequences were aligned back to TM-1 genomic sequence [7] using the automated batch BLASTN search with E 1e −10 . The best hit was chosen for each marker to infer the map position combining adjacent markers' positions. A custom Perl script was used to identify the actual marker' positions relative to the TM-1 genomic sequence [7]. In addition, some markers that were not confirmed in terms of physical position were not included in the downstream analysis. Finally, 4,157 filtered markers were physical mapped and the physical map of each chromosome was created with the program ''MapChart" [26]. The genome coverage of the physical map was calculated based on the reference genome size [7]. With the genetic and physical positions of markers, the collinearity of markers was compared and showed using the software Circos 0.67 [27].

Genomic features distribution
Based on the previous studies [7,20], the putative centromeric regions have been identified, except the A08 chromosome. The genomic distributions of the recombination rate, genes, transposable elements (TEs) and markers were investigated. All the nucleotide binding site (NBS)-leucine-rich repeat (LRR) genes were obtained from the TM-1 cotton genome [7]. The average recombination rate and the number of markers, genes and TEs in each 1Mb chromosomal region was calculated using a customized Perl script, which were showed with histogram and heat map using the Circos 0.67 software [27]. The correlations between average recombination rate and gene density, TE density, marker density and the distance from centromere were determined in each 1-Mb chromosomal region by the SPSS 17.0 software (SPSS, Chicago, USA).

Estimation of recombination rates
Each marker's start and end locations were averaged to determine the confirmed physical position in the genome [28]. After determining the physical positions of the markers and then combining the genetic positions, the local recombination rates were estimated along each of the 26 chromosomes using MareyMap [29]. The relationship of genetic and physical positions was demonstrated by a scatter plot with the markers' genetic positions (cM) versus physical positions (Mb). The recombination map was constructed and displayed by a smooth line chart in non-overlapping 1-Mb windows with the Loess method via using MareyMap [29].
Each cotton chromosome was divided into the non-overlapping windows (window size was 100 kb) and the recombination rate of each window was calculated. Based on the frequency distribution of the recombinant rate, two extreme ranges (1%) were chosen as the thresholds, which were 27 cM/Mb and 0.05 cM/Mb. If the recombination rate was ! 27 cM/Mb, the high recombination rate windows were considered and the genes in this window were collected. Similarly, if the rate was 0.05 cM/Mb, the low recombination rate windows outside the centromeric regions were considered and the genes in this window were collected. The putative functions of the genes in the high and low recombination regions were analyzed using Fisher's exact test in Blast2GO version 2.8 [30] with a p-value cut-off of 0.01.

Overview of genetic map and construction of the physical map
The high-density genetic map that mentioned above with 5,152 loci was 4696.03 cM and the mean distance was 0.91 cM. The At subgenome was 2359.36 cM with 2,473 loci and the mean distance was 0.95 cM; whereas the Dt subgenome was 2336.67 cM with 2,679 loci and the mean distance was 0.87 cM.
Furthermore, in this study, after excluding markers with no corresponding sequences, 4,157 markers were matched on the 26 physical chromosomes, and the physical map was constructed with an average of 160 markers per chromosome. For the At and Dt subgenomes, there were 1,939 and 2,218 total markers with an average of 149 and 170 markers per chromosome, respectively (Table 1 and S1 Fig). In At, chromosome A05 contained 224 markers, which was the largest number of markers on any chromosome, whereas A04 only contained 97 markers. In Dt, the highest number of markers (235) was located on chromosome D05 and the lowest number of markers (118) was located on D04. The physically mapped markers on the chromosomes exhibited a higher marker density in Dt (2.86/Mb) than in At (1.69/Mb); the maximum marker density was on A05 (2.41/Mb) and D05 (3.79/Mb), and minimum was on A06 (1.14/Mb) and D02 (2.25/Mb; Table 1). In addition, 384 markers were identified on the cotton genome scaffolds. Among them, 348 markers were on the scaffolds that were not anchored to certain chromosomes, which included 191 markers in the At subgenome and 157 markers in the Dt subgenome. The remaining 36 markers were on non-chromosome anchored scaffolds ( Table 1). The genome coverage of the physical map showed a drastic change along each chromosome, which ranged from 21.05% to 81.71% (Table 2). Generally, the genome coverage was higher on Dt chromosomes than on At chromosomes ( Table 2).

Collinearity of the genetic and physical maps
The collinearity of the high-density genetic and physical maps shown in Fig 1A(VI) and 1B (VI), was investigated on various chromosomes, At and Dt subgenomes, respectively. The results indicated that the physical map that was constructed by these markers was very effective. Most of the chromosomal markers had highly collinearity between the genetic and physical maps, especially on chromosomes D05, A08 and A01. Chromosomes A02, A05, A07, A09 and D13 were moderately consistent with their physical locations (Fig 2). Chromosomes D07 and D10 showed serious disagreement between the genetic and physical maps on either side of the chromosomes, which showed that the large segments on the two chromosomes were not co-linear, in terms of the physical map (Fig 2). In addition, the Spearman correlation coefficient between the genetic and physical positions of each chromosome was generally consistent with the levels of genetic collinearity. The value of Spearman correlation coefficient on chromosome D05 was 0.98, which indicated a strong correlation (Fig 2). The Dt subgenome exhibited better compatibility with the physical map than At, except chromosome D07 and D10 (Figs 1BVI and 2). The homologous chromosomes A01 and D01; along A08 and D08 showed high Spearman correlation coefficients (Fig 2). In contrast, there were low Spearman correlation coefficients in the homologous chromosomes A07 and D07 (Fig 2). At the whole genome level, the genetic and physical map distances did not have a simple linear relationship (Fig 2).

Landscape of genome-wide recombination rate
The landscape of the genome-wide variation of the recombination rate is shown in Fig 2 and exhibited an informative and precise estimate of the recombination rate per physical distance (Mb) along each of the 26 chromosomes at the sub-regional level. The distribution of the average genome-wide recombination rate was non-random, and the recombination rate in the distal chromosomal regions was higher than that of the proximal regions ( Fig 1III). Recombination rates also varied across individual chromosomes, as illustrated in Fig 2. Although most of the chromosomes' proximal regions lack recombination, some recombination spikes were stored in the centromeric and pericentromeric regions on a few chromosomes, such as A05 and A07 (Fig 2). In addition to the centromeric and pericentromeric regions, there were some other locations where little to no recombination occurred, such as the homologous chromosomes A07 and D07 as well as A09 and D09, especially in the left arm of A09 and right arm of D07 (Figs 1III and 2). The genome-wide average recombination rate significantly varied along each chromosome, which ranged from 1.60 cM/Mb to 4.97 cM/Mb (Table 2). In At, chromosomes A06, A07 and A08 had a lower average recombination rate, which was 1.86 cM/Mb, 1.60 cM/Mb and 1.71 cM/Mb, respectively. In contrast, higher recombination rates were found on chromosomes A05 (3.16 cM/Mb), A11 (3.26 cM/Mb) and A13 (3.26 cM/Mb; Table 2). In Dt, lower recombination rates were on chromosomes D10 (2.17 cM/Mb), D06 (2.77 cM/Mb) and D07 (2.97 cM/ Mb), and higher recombination rates were on chromosomes D03 (4.44 cM/Mb), D05 (4.36 cM/Mb) and D11 (4.97 cM/Mb). Obviously, the average recombination rates of D03 and D11 were more than twice as high as those of A07 and A08. Generally, the recombination rate was higher on Dt chromosomes than on At chromosomes (  Table 2). Although the average genetic length of At (181.42 cM) and Dt (179.74 cM) were similar, the physical length of Dt chromosomes was half as short as At chromosomes, and the recombination rates were nearly 0.5 times higher on Dt chromosomes.  Chromosomal rearrangement may have a considerable influence on recombination. The translocation regions between A02 and A03 and between A04 and A05 and inversion regions on A10, D10, A07 and D07 were observed with the physical markers locations, which indicated that the chromosomal translocation and inversion that occurred in the distal regions of the chromosomes had a relatively high recombination rate (Fig 2).

Correlations between recombination rate and genomic features
The distributions of recombination rate, genes, TEs and markers were not highly uniform along individual chromosome's length. The density of recombination rate, genes and markers decreased towards the chromosomes' middles; while, TE density greatly increased towards the centromeric regions from both chromosomes arms (Fig 1I-1IV). On the whole genome level, the recombination rate was significant positively correlated with the density of genes (R 2 = 0.58; p 0.01) and markers (R 2 = 0.43; p 0.01), but significant negatively correlated with the density of transposable elements (R 2 = -0.25; p 0.01; Table 2). The recombination rate showed a significant positive correlation with the distance from the centromere (R 2 = 0.30; p 0.01). The correlations of recombination rate and genomic features showed a similar varying tendency between At and Dt, and the average correlation coefficient of recombination rate and distance from the centromere on Dt (R 2 = 0.43; p 0.01) was higher than that of At (R 2 = 0.31; p 0.01; Table 2). The homologous chromosomes A03 (R 2 = 0.72; p 0.01) and D03 (R 2 = 0.77; p 0.01) showed a relatively high positive correlation between recombination rate and gene density. There was a higher negative correlation on homologous chromosomes A05 (R 2 = -0.47; p 0.01) and D05 (R 2 = -0.49; p 0.01), and A09 (R 2 = -0.54; p 0.01) and D09 (R 2 = -0.37; p 0.01) than others between recombination rate and TEs density. Taking individual chromosome into account, A02 had a strong relation between recombination rate and gene density (R 2 = 0.88; p 0.01); however, A07 showed a lowest correlation (R 2 = 0.39; p 0.01; Table 2). A09 showed a significant negative relation between recombination rate and TEs density (R 2 = -0.54; p 0.01); while, D03 and D06 showed weakly positive correlation (R 2 = 0.14). The correlation between recombination rate and marker density demonstrated drastic differences in different chromosomes, with correlation coefficient R 2 value ranging 0.01 (D09) to 0.71 (A02; Table 2). However, recombination rate showed a positive correlation with the distance from the centromere, except A07 (R 2 = -0.26) and D08 (unavailable). D05 showed the highest correlation coefficient (R 2 = 0.59; Table 2).

Functional categories of the genes in high or low recombination regions
To survey the functional categories of genes located in the high and low recombination regions, the genes were classed by the GO. The significantly enriched results of GO revealed that most of the genes in the regions with high recombination rates had putative functions responding to the environmental stimulus. For instance, response to biotic stimulus, defense response, response to wounding, response to bacterium, defense response to bacterium, response to external stimulus, response to external biotic stimulus, etc., were response to environmental stimuli (Fig 3A). In contrast, the results of GO in the low recombination regions showed that the genes were related to the spindles and mitosis, for example, spindle checkpoint, mitotic spindle checkpoint, mitotic cell cycle checkpoint, negative regulation of cell cycle process, sexual reproduction and negative regulation of mitosis, etc. (Fig 3B and S1 Fig).
In addition, based on the existing data, a positive correlation was found between recombination rates and the density of the NBS-LRR genes in cotton genome which are related to disease resistance (Fig 4).

Discussion
In this study, based on the high-density genetic map containing 5,152 loci [21] and the published TM-1 cotton genomic sequence [7], the corresponding physical map, including 4,157 markers, was constructed (S1 Fig). Chromosomes A05 and D05 had the highest marker density according to the physical map, and chromosomes A06 and D02 had the lowest marker density (Table 1). Chromosome D03 showed the highest genome coverage compared to other chromosomes ( Table 2). The recombination rate, genes and markers were positively related and showed uneven distribution across various chromosomes, with their densities increasing towards the chromosomes' ends rather than TEs (Fig 1I-1IV).
Recombination, as a vital component in crop breeding and genetics, plays a key role in genomic evolution, domestication and the improvement of crops [31,32], and can improve crops by creating genetic variation in gametes and new combinations of available genes. Understanding chromosomal recombination rate distribution is very important for characterizing and cloning genes. The distribution of recombination rate showed a close association with genes/markers distribution on cotton chromosomes (Figs 1I-1IV and 2). However, not all the non-random recombination distributions could be explained by the gene density [33][34][35], chromosomal inversion may cause the inactivation of recombination. Compared with the centromeric regions, high recombination rate was observed in the telomeric regions (Figs 1III  and 2), which demonstrated that recombination was suppressed in the centromeric regions. Similarly, in maize, the recombination rates were highest towards the telomeric ends of the chromosomes and highly suppressed near the centromeres [36,37].
Recombination greatly varied across the length of cotton chromosomes, and the correlation of recombination with the distance from the centromere of chromosomes (R 2 = 0.30; p 0.01) showed a higher correlation than soybean [28]. However, chromosome A07 showed a negative correlation between them (R 2 = -0.26; Table 2), which may be due to the repeated inversions causing a significant change in recombination rate [21,38]. The correlation between recombination rate and the distance from the chromosome centromere showed a great difference in different genome species, for example, the larger genomes like barley and wheat were known to show a higher correlation [39,40], but the smaller genomes like soybean and Arabidopsis showed the relatively lower correlation, especially in Arabidopsis [28,41]. An exponential increase between recombination rates and distance from the centromere was found in wheat [41]. In soybean, recombination had a relatively weak correlation relative to the distance from the centromere [28]. However, recombination showed very little correlation with the distance from the centromere in the model plant Arabidopsis [42].
The cotton genome is approximately 2 times bigger than the soybean genome and 16 times bigger than Arabidopsis, but it is approximately 7 times smaller than the wheat genome [43]. The distribution of recombination and gene was different between cotton and other genomic species such as soybean, barley and wheat, which may be due to the differences in genome sizes [44]. Moreover, chromosomal rearrangement could cause a significant divergence during the evolutionary process in the genomic regions of two different rice subspecies, which could lead to recombination rate changes [38]. The previous study revealed that inversions occurred between the cotton homologous chromosomes A07 and D07 as well as between A10 and D10. Translocations occurred between A02 and A03 and between A04 and A05 in cotton [21]. In this study, it was found that chromosomal rearrangement in these regions may cause the activation of recombination in distal regions of the chromosomes (Fig 2).
Furthermore, recombination hotspot and coldspot regions were observed on all the chromosomes. Previous studies showed that the recombination hotspot regions had small areas of chromosome length, but occupied large areas of genetic length, which indicated uneven recombination distribution [28,34]. The correlation between recombination rate and the distance from centromere was not very high, which may be due to the unevenness generated by hotspots and coldspots of recombination. Even there was barely any recombination in the distal regions of chromosomes that contained large chromosomal areas. The results may explain why the average genetic length of At (181.42 cM) and Dt (179.74 cM) were similar, but the Dt chromosomes were half as short as the At chromosomes, and the recombination rates were nearly 0.5 times higher on Dt chromosomes.
The genes responding to environmental stimuli usually have the higher recombination rates [45][46][47]. The significant GO enrichment result of the genes in the high recombination rates regions showed that many genes had functions responding to the environmental stimulus compared to those in the low recombination regions, for instance, response to biotic stimulus, response to wounding, response to bacterium, defense response to bacterium, response to external stimulus, etc. (Fig 3A), suggesting that these genes may respond to environmental stimuli and play an important role in adapting to the rapidly changing environments [47]. On the contrary, the genes in the low recombination regions were mostly related to mitotic spindle checkpoint, negative regulation of cell cycle process and negative regulation of mitosis, etc. (Fig 3B), which are important for mitosis and meiosis. The low recombination can assure the stability of basic cell cycle.
The NBS-LRR genes, as the largest group of plant disease resistance genes, were involved in the process of pathogen recognition and led to disease resistance [46]. Combining with previous researches [45,47], it has been found that the clustered NBS-LRR genes showed higher recombination rates than singleton NBS-LRR genes. As shown in Fig 4, there was a positive correlation between the density of NBS-LRR genes and recombination rates, suggesting that these stress-responsive genes may improve the possibility of retention, and frequent recombination may be very important to adapt to a complex environment [45,47].

Conclusions
Combing the genetic and physical maps makes it possible to shed light on the characteristics of recombination in cotton. The recombination rates were significantly different between At and Dt subgenomes and among different chromosomes. The translocation regions between A02 and A03 and between A04 and A05, and inversion regions on A10, D10, A07 and D07 showed a relatively high rate of recombination. The relationship between recombination rate and the genomic features, including density of genes, markers, and the distance from centromere, demonstrated a positive correlation as compared to transposable elements. The results of GO enrichments in the hotspot and coldspot regions showed that these genes respond to environmental stimuli and are related to mitosis and meiosis, respectively. Overall, the observations in this study provide insights into recombination rate variations, which will facilitate the understanding of cotton genetics.
Supporting information S1 Fig. The markers' physical map. (PDF) S1 Table. All the GO terms distribution of the genes in the low recombination regions. (XLS)