Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genome-Wide Analysis of Positively Selected Genes in Seasonal and Non-Seasonal Breeding Species

  • Yuhuan Meng ,

    Contributed equally to this work with: Yuhuan Meng, Wenlu Zhang, Jinghui Zhou

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Wenlu Zhang ,

    Contributed equally to this work with: Yuhuan Meng, Wenlu Zhang, Jinghui Zhou

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Jinghui Zhou ,

    Contributed equally to this work with: Yuhuan Meng, Wenlu Zhang, Jinghui Zhou

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Mingyu Liu,

    Current address: School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Junhui Chen,

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Shuai Tian,

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Min Zhuo,

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Yu Zhang,

    Affiliation Guangdong Key Laboratory of Laboratory Animals/Guangdong laboratory animals monitoring institution, Guangzhou, China

  • Yang Zhong,

    Affiliations School of Life Sciences, Fudan University, Shanghai, China, Institute of Biodiversity Science, Tibet University, Lhasa, China

  • Hongli Du ,

    Affiliation School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China

  • Xiaoning Wang

    Affiliations School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China, Chinese PLA General Hospital, Beijing, China

Genome-Wide Analysis of Positively Selected Genes in Seasonal and Non-Seasonal Breeding Species

  • Yuhuan Meng, 
  • Wenlu Zhang, 
  • Jinghui Zhou, 
  • Mingyu Liu, 
  • Junhui Chen, 
  • Shuai Tian, 
  • Min Zhuo, 
  • Yu Zhang, 
  • Yang Zhong, 
  • Hongli Du


Some mammals breed throughout the year, while others breed only at certain times of year. These differences in reproductive behavior can be explained by evolution. We identified positively-selected genes in two sets of species with different degrees of relatedness including seasonal and non-seasonal breeding species, using branch-site models. After stringent filtering by sum of pairs scoring, we revealed that more genes underwent positive selection in seasonal compared with non-seasonal breeding species. Positively-selected genes were verified by cDNA mapping of the positive sites with the corresponding cDNA sequences. The design of the evolutionary analysis can effectively lower the false-positive rate and thus identify valid positive genes. Validated, positively-selected genes, including CGA, DNAH1, INVS, and CD151, were related to reproductive behaviors such as spermatogenesis and cell proliferation in non-seasonal breeding species. Genes in seasonal breeding species, including THRAP3, TH1L, and CMTM6, may be related to the evolution of sperm and the circadian rhythm system. Identification of these positively-selected genes might help to identify the molecular mechanisms underlying seasonal and non-seasonal reproductive behaviors.


The environment can influence gene evolution and thus animal behaviors, including reproduction-related behaviors. Some mammals can breed throughout the year, while others only breed successfully at certain times of year. Such animals are defined as non-seasonal and seasonal breeding species, respectively. Day length, temperature, and food supply can all influence the reproductive behavior of seasonal breeding species and subsequent survival of offspring [1]; if they breed too early, the growing offspring may be exposed to low temperatures and scarce resources, whereas late breeding limits the time available for reproductive behaviors and preparation for the following winter. Accurate timing is therefore an essential component of life-history strategies for organisms living in seasonal environments [2]. The different reproductive behaviors of seasonal and non-seasonal breeding species may result from natural selection pressures [3]. Both strategies benefit the respective species to survive by adaption of their breeding behaviors to the environment through their long evolutionary histories. Whole genome-wide analysis of genes that are positively selected in mammal lineages using the respective breeding strategies may help us to understand the mechanisms responsible for the divergent reproductive behaviors as a result of adaptive evolution.

Positive Darwinian selection of protein-coding genes is a major driving force for detecting adaptive evolution and species diversification. The modified version of the branch-site test (Model A) [4, 5] was designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues in particular lineages. This test has been shown to be a reasonably powerful tool, and has been widely used to investigate the adaptive evolution of genes in many species [68].

However, alignment errors may influence the results of branch-site gene analysis in mammalian and vertebrate species. It is therefore necessary to use reliable alignment methods to reduce the incidence of false-positive results [9]. Although the aligner software PRANK [10, 11] cannot eliminate false-positive results, it is nonetheless more powerful than other aligners [9, 12] such as MUSCLE [13] and ClustalW [14]. In addition to misalignments in multiple sequences, other factors such as sequence errors, misassembly, and annotation mistakes also increase the incidence of falsely-identified positive selection [15, 16]. More stringent filters are needed to ensure that branch-site analysis has a low and acceptable false positive rate.

In this genome-wide study, we investigated the evolution of seasonal breeding strategies by identifying positively-selected genes in non-seasonal and seasonal breeding species using modified branch-site models. We established Distant-Species and Close-Species sets, each of which included seasonal and non-seasonal groups. We then identified positively-selected genes in these groups. PRANK (codon) software was used to align all the gene orthologs in the two gene sets. However, because PRANK generates a relatively high false-positive rate with the branch-site model, stringent filtering using sum of pairs (SP) [17, 18] scoring was used to remove potentially unreliable alignments generated by multiple sequence alignments. Sequence errors, misassembly, and annotation mistakes were also detected by cDNA mapping. Functional analysis of genes identified as positively-selected after this stringent filtering process might help us to understand the molecular mechanisms that determine non-seasonal and seasonal breeding.

Materials and Methods

Materials preparation

Five non-seasonal breeding species and five seasonal breeding species were chosen as the Distant-Species set. The five non-seasonal species included: human (Homo sapiens, GRCh37), chimpanzee (Pan troglodytes, CHIMP2.1) [19], cynomolgus monkey (or crab-eating macaque, Macaca fascicularis) [20], mouse (Mus musculus, NCBIM37) [21] and rat (Rattus norvegicus, RGSC3.4) [22]. The five seasonal breeding species were Indian rhesus monkey (Macaca mulatta, MMUL_1) [23], Chinese rhesus monkey (M. mulatta lasiota, CR) [24], dog (Canis familiaris, BROADD2)[25], horse (Equus caballus, EquCab2) [26] and rabbit (Oryctolagus cuniculus, oryCun2) [27]. The long lineages between species in the Distant-Species set means that behaviors may have changed back and forth between seasonal and non-seasonal breeding strategies several times, while the divergent sequences might influence the branch-site model analysis and generate false positives [28]. To address this problem, we also established a Close-Species set that only included closely-related, non-seasonal (human, gorilla (Gorilla gorilla, gorGor3.1) [29], chimpanzee, and cynomolgus monkey), and seasonal-breeding species (orangutan (Pongo abelii, PPYG2) [30], Indian rhesus monkey, Chinese rhesus monkey, and marmoset (Callithrix jacchus, C_jacchus3.2.1) [31]).

The protein-coding sequences for human, gorilla, chimpanzee, orangutan, Indian rhesus monkey, marmoset, mouse, rat, dog, horse, and rabbit were downloaded from the Ensembl database (version 64, Sep. 2011; [32]. The sequences for cynomolgus monkey ( and Chinese rhesus macaque ( were provided by BGI [33]. The corresponding cDNA sequences used in the accuracy assessment were downloaded from NCBI. Detailed information on the cDNA sequences used in this study are listed in S1 Table.

Calculating positively-selected sites

To identify 1:1 gene orthologs, human protein sequences were used to conduct BLAST [34] searches against other species sequences (blastp-F T-e 1e-5-m 8). It is difficult to select a set of transcripts to minimize alignment gaps and potential errors and thus false-positive branch-site test results [35]. In simple analyses in previous studies [68, 33, 3641], the longest transcript for a given gene was chosen. Reciprocal searches were then performed for each species protein sequences relative to human protein sequences. In each search, pairwise sequences with identities <60% were excluded, and the highest hit for each query was retained to determine the pairwise orthologs between humans and other species.

Modified branch-site models [5] for adaptive evolution analysis used each species in one breeding series as the foreground species, and all the other species in that breeding series as background species. For example, to test for positive selection in humans in the Distant-Species set, the human branch was designated as the foreground branch, and the other five species in the seasonal breeding group were designated as background branches. Positive selection signals for all species were tested similarly.

Protein-coding sequences associated with the corresponding 1:1 gene orthologs were aligned using PRANK (codon). The corresponding gene-based phylogenetic trees were constructed using the maximum likelihood method in the PHYLIP 3.69 [42] software package, according to the tested aligned protein-coding sequences. The aligned protein-coding sequences and the corresponding phylogenetic trees were then used to analyze the adaptive evolution using the branch-site model in PAML’s codeML program [4]. Branch-site modified model A (model = 2, NSsits = 2) and the corresponding null model (model = 2, NSsits = 2, fix_omega = 1 and omega = 1) [5] were used to identify sequences under positive selection in both test sets of animals. Significance was calculated using the χ2 statistic, with one degree of freedom. Genes with p ≤0.01 were considered to be positively selected [5]. The p values were adjusted according to the FDR method (multiple testing correction with the method of Benjamini and Hochberg) [43] to allow for multiple testing, with a strict criterion of FDR <0.05. Positively-selected sites were obtained based on the Bayes Empirical Bayes (BEB) analysis [5], with a posterior probability >95%.

Screening for valid positive sites by SP penalty scoring

To ensure the accuracy of the positive sites, extended sequences were extracted including 15 amino acids (45 base pairs) upstream and downstream from the positive sites. SP [17, 18] measurements were then performed for penalty scoring of the sequences in both streams. (1) Some of the positive sites were at the edge of the beginning or end of the gene and were not reached by the upstream or downstream sequences, and the penalty base score was set separately for both streams (regarded as S, S = 15/n, where n is equal to the number of amino acids in the upstream or downstream sequence). (2) Penalty scores added 0 point for each position in perfect alignment, while mismatched sites or gaps in the alignment were awarded penalty scores of minus S or 2S, respectively. (3) Penalty scores for the upstream and downstream sequences were calculated separately, and the total penalty scores were the sum of the upstream and downstream scores. (4) Average penalty scores were calculated as the final scores (average penalty score = total penalty score/N, where N is the number of sequences used in each alignment).

General and individual penalty scores were used. General penalty scores were equal to the sum of the penalty scores from each of the two compared species. For individual penalty scores, sequences with positive sites were compared with each of the other sequences used in the alignment in turn, and the total penalty scores were regarded as the individual penalty score.

Threshold values were set for general and individual penalty scores to filter sequences with valid positive sites. In this study, the threshold values for the general and individual penalty scores were −50 and −15, respectively. If both the general and individual penalty scores were greater than the threshold value, the sequences were filtered and the sites regarded as positive.

Accuracy of positive sites according to cDNA sequences

Mistakes can occur during genome sequencing, sequence assembly, or gene annotation, and cDNA sequences can be used as references to assess the accuracy of the positive sites.

Corresponding cDNA sequences were first matched to the gene sequences using the function BLAST [34] (blastn-e 1e-10-a 4-m 8). cDNA sequences that included the positions corresponding to the positive sites were then filtered. Further analysis was conducted using MEGA5 [44]. The gene sequences and their corresponding cDNA sequences were then subjected to alignment analysis using the MUSCLE [13] function. If the nucleotide sequences of the positive sites were identical to those of the corresponding positions in the cDNA sequences, the positive sites were regarded as valid.


Preliminary filtering of positively-selected genes using PRANK and branch-site model

Totals of 11,031 and 13,171 1:1 gene orthologs with >60% identities were filtered from the Distant- and Close-Species sets, respectively, by BLAST [34]. The corresponding protein sequences were used for subsequent alignments. The numbers of pairwise gene orthologs between humans and other species are listed in S2 Table. After alignment using PRANK (codon), 10,918 gene orthologs in the Distant-Species set and 12,485 in the Close-Species set were tested for positive selection signals using the codeML program in the PAML package [4], with the modified branch-site model [5]. Positively-selected genes in each species with a p value <0.01(comparing LRT, the likelihood ratio test, with the χ2 distribution) and with a false-discovery rate (FDR) <5% are shown in Table 1.

Table 1. Numbers of positively-selected genes under different filtering conditions.

In the Distant-Species set, the mean number of positively-selected genes in the seasonal species was four fold greater than in the non-seasonal species (fdr <0.05) (Fig 1A, Table 1). The equivalent increase in the Close-Species set was about 2.63-fold (Fig 1B, Table 1). These results demonstrate that there were more positively-selected genes in seasonal compared with non-seasonal breeders in both species sets.

Fig 1. Numbers of positively-selectived genes (fdr <0.05) and sites (after SP-score filtering).

(A). Positively-selected genes corrected by FDR. Sites (BEB >0.95) were filtered by SP scores in the Distant-Species set. (B). Positively-selected genes (FDR >0.05) and positive sites (BEB >0.95) filtered by general SP score >-50 and individual SP score >-15 in the Close-Species set.

However, there were more positively-selected genes in the Close-Species than in the Distant-Species set (mean numbers with FDR <0.05 208.75 and 111.9, respectively). In addition to the different numbers of orthologs (12,485 vs. 10,918), it is also possible that more gaps were generated by alignment in the Distant-Species gene ortholog set compared with in the Close-Species set (mean gap length 244 in the Close-Species set and 322 in the Distant-Species set) (S3 Table), because the sequence divergence was smaller in the Close-Species set. The number of gaps may influence the results of branch-site analysis, because the branch-site would remove columns with gaps in the alignment sequences and would thus exclude more potential positive sites in the Distant-Species set compared with the Close-Species set.

Identification of false-positive sites through sequence misalignment

Putative positively-selected sites in the genome (FDR<0.05) were obtained by Bayes Empirical Bayes (BEB) analysis (posterior probability >95%) [5]. The numbers of putative positively-selected sites in each species are listed in Table 2. The details of all the positive sites with BEB >0.95 are listed in S4 Table.

Alignment problems may influence the performance of the branch-site test, with poor alignment increasing the incidence of false-positive sites. We therefore filtered out sites with obvious signs of unreliable alignment. We also calculated the SP [17, 18] score for each of the positive sites’ extended sequences (± 15 amino acids/45 base pairs). Most unreliable alignments are represented by numerous gaps and sequence divergences (S1 Fig and S5 Table). After filtering, a total of 2009/3810 (52.73%) positive sites remained. Sites with extended alignments with low divergence are listed in S6 Table. The results after filtering revealed more sites with positive selection in the seasonal compared with the non-seasonal breeding species (Table 2). The false-positive rate due to misalignment was 33.33%–61.28% (Table 2), which was similar to that of 50%–55% in a previous report [12]. After alignment filtering, differences in gene numbers between species in the Distant- and Close-Species sets were consistent with those after FDR-adjusted filtering. However, the false positive rate(FPR) statistics only considered misalignment and did not take account of other factors such as sequence errors, misassembly, or annotation problems.

According to extended-sequence alignments of the positive sites, SP scores <-50 were generally caused by excessive gaps or deficient matches, of which gaps contributed more to the low SP penalty scores (S1 Fig and S6 Table). Gaps and deficient matches may arise as a result of diversity between species or different transcript lengths, because we used the longest human transcripts to BLAST other species’ protein-coding sequences [35]. Columns with gaps in the alignments would be deleted in branch-site models, even though positive sites may be located within deficient sequence alignments surrounded by gaps or mismatched sequences. A threshold SP score of −50 can filter out most false-positive sites caused by divergent sequence alignments. SP scoring thus improves the reliability of the results by reducing the false-positive rate caused by unreliable alignments. Details of the positive genes filtered by SP scoring are shown in S7 Table.

cDNA mapping as a novel method of filtering positive sites

The quality of the genome may limit the accuracy of evolutionary analysis. It can result in false-positive results associated with sequencing errors, alternative splicing, amino acid repeats, and frameshift mutations, causing mistakes in gene annotation [8, 15]. However, cDNA sequences are much shorter than genome sequences and are thus more reliable. The reliability of positive sites will therefore be increased if sequences with positive sites are mapped to the corresponding cDNA sequences and aligned with most of the bases. We therefore used cDNA mapping as a novel means of testing sequence errors.

cDNA sequences corresponding to the positive sites were analyzed. In this study, we aligned a total of 193 positive sites in perfect alignment with at least one cDNA sequence of the corresponding species using the MUSCLE function [13] in MEGA5 [44]. The coverage between positive sites and corresponding cDNA sequences was low (<10%, 193/2009), and the false positive rate was 61.66% (120/193). Most inconsistent sites were in cynomolgus monkey, horse, and orangutan, which had genome sequences of low quality or with annotation mistakes. In contrast, the human, mouse and rat genome sequences showed high accuracy. The details of the positive sites mapped with the corresponding cDNA sequences are shown in S1 Table. A total of 74 corresponding cDNA sites were finally identified that were consistent with the positive sites (S1 Table). No corresponding cDNA sequences mapped to the positive sites in gorillas, Chinese rhesus monkeys, and marmosets. After verification by cDNA filtering, 39 genes remained, including 15 genes that were positively-selected in non-seasonal species (Table 3), and 24 in seasonal species (Table 4). Although the limited availability of cDNA sequences meant that only a few positive sites remained after mapping, these sites were likely to be more accurate.

Table 3. Positively-selected genes in non-seasonal species filtered by SP scoring and corrected by cDNA mapping.

Table 4. Positively-selected genes in seasonal species filtered by SP scoring and corrected by cDNA mapping.


Influence of alignment and annotation

The results of evolutionary analysis are influenced the quality of the genome sequence; false-positive sites may be detected and important information may be missed as a result of low-quality sequences [15, 16]. Unfortunately, recent genome-sequencing techniques are still unable to provide sequences reliable enough for evolutionary analysis. Stringent filtering functions and parameters are therefore needed to obtain reliable positive sites, and careful analytical design can achieve reliable results, even from low-quality genome sequences.

Evolutionary analysis usually starts with sequence alignment using software such as ClustalW, MUSCLE or PRANK. In this study, we used PRANK (codon), because this software takes evolutionary information into consideration before placing the gaps [11], resulting in fewer mismatches but larger gaps compared with the other programs (S3 Table). Valid positive sites are likely to be located in alignments with low divergence and few gaps or mismatches, and sequence misalignments can thus generate false-positive sites in branch-site models. The branch-site model usually deletes columns with gaps in the alignments when calculating positive sites, so some sites located in deficient alignments may be regarded as positive, whereas some true-positive sites may be missed. SP-score filtering, which focuses on filtering out such false-positive sites, can be used to reduce the false-positive rate and ensure the quality of the filtered positive sites. On the other hand, cDNA mapping can exclude false-positive sites that originate from mistakes in genome sequence assembly and gene annotation. The combination of these processes can thus filter out many false-positive sites and identify low-quality genome sequences, such as those for cynomolgus monkey, horse, and orangutan in this study.

cDNA sequences in previous genome-wide studies have generally been used as references for gene annotation [4547]. In contrast, we used cDNA mapping as a novel method to identify positive sites with high quality. Because cDNA sequences are usually relatively short, current sequencing techniques can provide reliable sequences. Moreover, some sites can be mapped to more than one corresponding cDNA sequence. cDNA mapping can thus ensure the quality of the remaining positive sites. However, there are some limitations. More than 90% of sites cannot be matched with corresponding cDNA sequences, and the validity of these sites therefore cannot be checked using this method. Because cDNA sequences are usually sequenced for a specific purpose, corresponding cDNA sequences may not be available for some putative positive sites, and genes with important evolutionary implications may be missed.

Positively-selected genes in seasonal and non-seasonal breeders

Evolutionary analysis of genome sequences can be used to identify specific, positively-selected genes in various species. The genetic mechanisms and potential environmental adaptations associated with seasonal and non-seasonal breeding can then be inferred by functional analysis of positively-selected genes in the respective species.

The functions of positively-selected genes in non-seasonal breeding species reflect reproductive tendencies such as sperm generation and cell proliferation. Two key genes perform these functions in humans: CGA (glycoprotein hormones, alpha polypeptide) is a gonadotropin subunit [48, 49], while CD151 functions in promoting metastasis, and increases the expression of phospho-extracellular signal-regulated kinase (ERK) [50, 51]. Given that ERK is a component of the mitogen-activated protein kinase pathway, positive selection pressure on this gene may influence cell proliferation and differentiation [52, 53]. Mutation of Dnah1 in mice has been reported to cause male infertility [54, 55], suggesting that it may play an important role in influencing mating behavior. Another crucial gene in rats, Invs, is involved in controlling cytoskeletal organization and cell division, which are essential for reproduction [56, 57]. Moreover, this gene can interact with NPHP1 and NPHP3 that influence the Wnt signaling pathway, which may in turn influence kidney function and renal cell formation linked to spermatocyte and spermatid generation in the testis [5860]. These positively-selected genes may reflect modulation of the reproductive system under environmental pressure in non-seasonal breeding species, enabling them to breed throughout the year. The identification of positive sites focused on sperm generation and cell proliferation suggests that mutations in these genes may influence sperm quantity or reproductive capacity.

Genes that were positively selected in seasonal breeding species differed from those in non-seasonal species in having less focused functions. However, the orangutan provided the most valid positive genes among these species, and their functional analysis may help to explain some predominant characteristics of seasonal breeding species. The key gene, THRAP3 (thyroid hormone receptor associated protein 3, also known as Thrap150), is a selective coactivator for CLOCK-BMAL1 and promotes CLOCK-BMAL1 binding to target genes [61]. Moreover, THRAP3 can also interact with HELZ2, which regulates adipocyte differentiation [62]. Clock and Bmal1 have previously been reported to be closely related to seasonal breeding behaviors [63], the THRAP3 mutation may thus influence the circadian rhythm of the reproductive system. This is supported by a previous study showing that thyroid hormone catabolism within the mediobasal hypothalamus regulated seasonal gonadotropin-releasing secretion [64]. However, because orangutans live in Indonesia, which has high temperature throughout the year [30, 65], they may not need to adjust their physical condition, such as lipid storage, to cope with cold weather. THRAP3 may thus influence adipocyte differentiation, while other functionally-related genes such as MTMR12 [66] and ZFR [67] would be positively selected because of such environmental conditions. In addition to THRAP3, the positively-selected genes TH1L and CMTM6 may also help to explain the seasonal breeding behavior. As TH1L may have a similar function to TH1, which attenuates androgen signaling [68], while CMTM6 functions in spermatogenesis [6971]. Evidence from previous studies suggests that orangutans produce 14 times less sperm than chimpanzees, which is a closely-related, but non-seasonal breeder [72]. Seasonal breeding in orangutans may thus be a consequence of circadian rhythm and limited sperm production, which restrict their breeding to the period from December to May, the most productive months in terms of food (fruit) supply, to ensure adequate food and energy for effective reproduction [73].

Diversity in breeding behaviors can generally be attributed to mutations affecting endocrine mechanisms. Such mutations may be related to specific environmental conditions, such as temperature and food supply. In this study, positively-selected genes related to sperm generation were identified in both types of breeding species. Indeed, previous reports have indicated rapid evolution of sperm proteins in mammals [74, 75]. Evolutionary mutations in these genes may not lead to the unique consequences associated with different breeding strategies. However, previous studies have indicated that the reproduction behavior in seasonal breeding species is largely under the regulation of the circadian rhythm system [64]. This is consistent with our results, which showed that THRAP3, which is functionally-related to the CLOCK-BMAL1 system, was under positive selection pressure. The mechanisms determining breeding behaviors can be complicated, but evolution leads to adaptation to the environment, enabling well-adapted lineages to persist for many generations.


In this study, we conducted a precise, genome-wide scan to detect genes that were positively selected between seasonal and non-seasonal breeding species. The evolutionary analysis was designed to reduce the incidence of false-positive sites by SP filtering and cDNA mapping. Although the lack of cDNA sequences means that some positive genes may have been missed, the identification of valid, positively-selected genes with functions relating to spermatogenesis, cell proliferation, and circadian rhythm might indicate possible molecular mechanisms underlying the seasonal and non-seasonal reproductive behaviors. Further developments in genome-sequencing technologies will allow the sequencing and assembly of higher-quality genomes, and more accurate gene annotation, while the availability of more cDNA sequences will increase the value of cDNA mapping for improving the accuracy of evolutionary analysis.

Supporting Information

S1 Fig. Sites with extended sequences alignments.

(A). Perfect alignment. (B). Acceptable alignment. (C). Unacceptable alignment because of large number of gaps. (D). Unacceptable alignment because of putative positive sites located in poorly-aligned sequences. (E). False negative. SP scoring filtered out mistaken acceptable alignments.


S1 Table. Positive sites mapped with the corresponding cDNA sequences.


S2 Table. 1:1 gene orthologs.

Gene orthologs were generated by BLAST, and the best hit of human versus the other species was then reversed. All identities were >60%.


S3 Table. Lengths of gene sequences before and after alignments with different aligners.


S5 Table. SP scores of positive sites after sequence alignment.


S6 Table. Positive sites after SP-score filtering.


S7 Table. Positive genes filtered by SP scoring.



We thank Prof. Bruce Lahn for his advice and comments. We also thank BGI-Shenzhen who provided the Chinese rhesus monkey and cynomolgus macaque genomes.

Author Contributions

Conceived and designed the experiments: HD XW YM. Performed the experiments: YM WZ JZ JC ST. Analyzed the data: MZ Y. Zhong YM. Contributed reagents/materials/analysis tools: WZ ML Y. Zhang. Wrote the paper: YM WZ JZ HD Y. Zhong.


  1. 1. Prendergast BJ. Internalization of seasonal time. Hormones and behavior. 2005;48(5):503–11. Epub 2005/07/20. pmid:16026787.
  2. 2. Hut RA. Photoperiodism: shall EYA compare thee to a summer's day? Current biology: CB. 2011;21(1):R22–5. Epub 2011/01/11. pmid:21215931.
  3. 3. Ims RA. The ecology and evolution of reproductive synchrony. Trends in ecology & evolution. 1990;5(5):135–40. Epub 1990/05/01. pmid:21232341.
  4. 4. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer applications in the biosciences: CABIOS. 1997;13(5):555–6. Epub 1997/11/21. pmid:9367129.
  5. 5. Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular biology and evolution. 2005;22(12):2472–9. Epub 2005/08/19. pmid:16107592.
  6. 6. Bakewell MA, Shi P, Zhang J. More genes underwent positive selection in chimpanzee evolution than in human evolution. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(18):7489–94. Epub 2007/04/24. pmid:17449636; PubMed Central PMCID: PMC1863478.
  7. 7. Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, et al. Patterns of positive selection in six Mammalian genomes. PLoS genetics. 2008;4(8):e1000144. Epub 2008/08/02. pmid:18670650; PubMed Central PMCID: PMC2483296.
  8. 8. Sun YB, Zhou WP, Liu HQ, Irwin DM, Shen YY, Zhang YP. Genome-wide scans for candidate genes involved in the aquatic adaptation of dolphins. Genome biology and evolution. 2013;5(1):130–9. Epub 2012/12/19. pmid:23246795; PubMed Central PMCID: PMC3595024.
  9. 9. Fletcher W, Yang Z. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Molecular biology and evolution. 2010;27(10):2257–67. Epub 2010/05/08. pmid:20447933.
  10. 10. Loytynoja A, Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(30):10557–62. Epub 2005/07/08. pmid:16000407; PubMed Central PMCID: PMC1180752.
  11. 11. Loytynoja A, Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science (New York, NY). 2008;320(5883):1632–5. Epub 2008/06/21. pmid:18566285.
  12. 12. Markova-Raina P, Petrov D. High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome research. 2011;21(6):863–74. Epub 2011/03/12. pmid:21393387; PubMed Central PMCID: PMC3106319.
  13. 13. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004;32(5):1792–7. Epub 2004/03/23. pmid:15034147; PubMed Central PMCID: PMC390337.
  14. 14. Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al]. 2002;Chapter 2:Unit 2 3. Epub 2008/09/17. pmid:18792934.
  15. 15. Schneider A, Souvorov A, Sabath N, Landan G, Gonnet GH, Graur D. Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment. Genome biology and evolution. 2009;1:114–8. Epub 2009/01/01. pmid:20333182; PubMed Central PMCID: PMC2817407.
  16. 16. Mallick S, Gnerre S, Muller P, Reich D. The difficulty of avoiding false positives in genome scans for natural selection. Genome research. 2009;19(5):922–33. Epub 2009/05/05. pmid:19411606; PubMed Central PMCID: PMC2675981.
  17. 17. Altschul SF. Gap costs for multiple sequence alignment. Journal of theoretical biology. 1989;138(3):297–309. Epub 1989/06/08. pmid:2593679.
  18. 18. Gupta SK, Kececioglu JD, Schaffer AA. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. Journal of computational biology: a journal of computational molecular cell biology. 1995;2(3):459–72. Epub 1995/01/01. pmid:8521275.
  19. 19. Mitani JC, Watts DP, Muller MN. Recent developments in the study of wild chimpanzee behavior. Evolutionary Anthropology: Issues, News, and Reviews. 2002;11(1):9–25.
  20. 20. Sun Z, Zeng L, Hong B, Zhang G. Primary Research on Reproduction of Cynomolgus Monkeys in an Indoor Breeding Mode in Beijing Area. Chinese Journal of Comparative Medicine. 2008;11:33–5.
  21. 21. Bronson FH. The reproductive ecology of the house mouse. The Quarterly review of biology. 1979;54(3):265–99. Epub 1979/09/01. pmid:390600.
  22. 22. Perry JS. The Reproduction of the Wild Brown Rat (Rattus norvegicus Erxleben). Proceedings of the Zoological Society of London. 1945;115(1–2):19–46.
  23. 23. Harcourt AH, Harvey PH, Larson SG, Short RV. Testis weight, body weight and breeding system in primates. Nature. 1981;293(5827):55–7. Epub 1981/09/03. pmid:7266658.
  24. 24. Hou J, Qu W, Chen L, Zhang H. Study of the Reproduction Eco-Behavior of Macaca mulatta in Taihang Mountains. Chinese Journal of Ecology. 1998;17:22–5.
  25. 25. Pal SK, Ghosh B, Roy S. Dispersal behaviour of free-ranging dogs (Canis familiaris) in relation to age, sex, season and dispersal distance. Applied Animal Behaviour Science. 1998;61(2):123–32.
  26. 26. Johnson L, Thompson DL Jr., Effect of seasonal changes in Leydig cell number on the volume of smooth endoplasmic reticulum in Leydig cells and intratesticular testosterone content in stallions. Journal of reproduction and fertility. 1987;81(1):227–32. Epub 1987/09/01. pmid:3668953.
  27. 27. Brambell FWR. The Reproduction of the Wild Rabbit Oryctolagus cuniculus (L.). Proceedings of the Zoological Society of London. 1944;114(1–2):1–45.
  28. 28. Yang Z, dos Reis M. Statistical properties of the branch-site test of positive selection. Molecular biology and evolution. 2011;28(3):1217–28. Epub 2010/11/23. pmid:21087944.
  29. 29. Watts DP. Mountain gorilla reproduction and sexual behavior. American Journal of Primatology. 1991;24(3–4):211–25.
  30. 30. Singleton I, van Schaik CP. The social organisation of a population of Sumatran orang-utans. Folia primatologica; international journal of primatology. 2002;73(1):1–20. Epub 2002/06/18. pmid:12065937.
  31. 31. Sousa MBC, Peregrino HPA, Cirne MFC, Mota MTS. Reproductive patterns and birth seasonality in a South-American breeding colony of common marmosets, Callithrix jacchus. Primates. 1999;40(2):327–36.
  32. 32. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, et al. Ensembl 2012. Nucleic acids research. 2012;40(Database issue):D84–90. Epub 2011/11/17. pmid:22086963; PubMed Central PMCID: PMC3245178.
  33. 33. Yan G, Zhang G, Fang X, Zhang Y, Li C, Ling F, et al. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nature biotechnology. 2011;29(11):1019–23. Epub 2011/10/18. pmid:22002653.
  34. 34. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology. 1990;215(3):403–10. Epub 1990/10/05. pmid:2231712.
  35. 35. Villanueva-Canas JL, Laurie S, Alba MM. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome biology and evolution. 2013;5(2):457–67. Epub 2013/02/05. pmid:23377868; PubMed Central PMCID: PMC3590775.
  36. 36. Arbiza L, Dopazo J, Dopazo H. Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome. PLoS computational biology. 2006;2(4):e38. Epub 2006/05/10. pmid:16683019; PubMed Central PMCID: PMC1447656.
  37. 37. Rhesus Macaque Genome S, Analysis C, Gibbs RA, Rogers J, Katze MG, Bumgarner R, et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science (New York, NY). 2007;316(5822):222–34. Epub 2007/04/14. pmid:17431167.
  38. 38. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome research. 2009;19(2):327–35. Epub 2008/11/26. pmid:19029536; PubMed Central PMCID: PMC2652215.
  39. 39. Toll-Riera M, Laurie S, Alba MM. Lineage-specific variation in intensity of natural selection in mammals. Molecular biology and evolution. 2011;28(1):383–98. Epub 2010/08/07. pmid:20688808.
  40. 40. Carneiro M, Albert FW, Melo-Ferreira J, Galtier N, Gayral P, Blanco-Aguiar JA, et al. Evidence for widespread positive and purifying selection across the European rabbit (Oryctolagus cuniculus) genome. Molecular biology and evolution. 2012;29(7):1837–49. Epub 2012/02/10. pmid:22319161; PubMed Central PMCID: PMC3375474.
  41. 41. Laurie S, Toll-Riera M, Rado-Trilla N, Alba MM. Sequence shortening in the rodent ancestor. Genome research. 2012;22(3):478–85. Epub 2011/12/01. pmid:22128134; PubMed Central PMCID: PMC3290783.
  42. 42. Felsenstein J. PHYLIP: Phylogeny Inference Package. University of Washington,Seattle, WA. ( 1993.
  43. 43. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological). 1995:289–300.
  44. 44. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular biology and evolution. 2011;28(10):2731–9. Epub 2011/05/07. pmid:21546353; PubMed Central PMCID: PMC3203626.
  45. 45. Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, et al. CDS annotation in full-length cDNA sequence. Genome research. 2003;13(6B):1478–87. Epub 2003/06/24. pmid:12819146; PubMed Central PMCID: PMC403693.
  46. 46. Kim DS, Huh JW, Kim YH, Park SJ, Lee SR, Chang KT. Full-length cDNA sequences from Rhesus monkey placenta tissue: analysis and utility for comparative mapping. BMC genomics. 2010;11:427. Epub 2010/07/14. pmid:20624290; PubMed Central PMCID: PMC2996955.
  47. 47. Uenishi H, Morozumi T, Toki D, Eguchi-Ogawa T, Rund LA, Schook LB. Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence. BMC genomics. 2012;13:581. Epub 2012/11/16. pmid:23150988; PubMed Central PMCID: PMC3499286.
  48. 48. Rebers F, Tensen C, Schulz R, Goos HT, Bogerd J. Modulation of glycoprotein hormone α-and gonadotropin IIβ-subunit mRNA levels in the pituitary gland of mature male African catfish, Clarias gariepinus. Fish Physiology and Biochemistry. 1997;17(1–6):99–108.
  49. 49. Orth JM. Cell biology of testicular development in the fetus and neonate. Cell and molecular biology of the testis. 1993:3–42.
  50. 50. Yang W, Li P, Lin J, Zuo H, Zuo P, Zou Y, et al. CD151 promotes proliferation and migration of PC3 cells via the formation of CD151-integrin alpha3/alpha6 complex. Journal of Huazhong University of Science and Technology Medical sciences = Hua zhong ke ji da xue xue bao Yi xue Ying De wen ban = Huazhong keji daxue xuebao Yixue Yingdewen ban. 2012;32(3):383–8. Epub 2012/06/12. pmid:22684562.
  51. 51. Yue S, Mu W, Zoller M. Tspan8 and CD151 promote metastasis by distinct mechanisms. European journal of cancer (Oxford, England: 1990). 2013;49(13):2934–48. Epub 2013/05/21. pmid:23683890.
  52. 52. Jang YN, Baik EJ. JAK-STAT pathway and myogenic differentiation. Jak-Stat. 2013;2(2):e23282. Epub 2013/09/24. pmid:24058805; PubMed Central PMCID: PMC3710318.
  53. 53. Weber-Nordt RM, Mertelsmann R, Finke J. The JAK-STAT pathway: signal transduction involved in proliferation, differentiation and transformation. Leukemia & lymphoma. 1998;28(5–6):459–67. Epub 1998/06/05. pmid:9613975.
  54. 54. Ben Khelifa M, Coutton C, Zouari R, Karaouzene T, Rendu J, Bidart M, et al. Mutations in DNAH1, which encodes an inner arm heavy chain dynein, lead to male infertility from multiple morphological abnormalities of the sperm flagella. American journal of human genetics. 2014;94(1):95–104. Epub 2013/12/24. pmid:24360805; PubMed Central PMCID: PMC3882734.
  55. 55. Neesen J, Kirschner R, Ochs M, Schmiedl A, Habermann B, Mueller C, et al. Disruption of an inner arm dynein heavy chain gene results in asthenozoospermia and reduced ciliary beat frequency. Human molecular genetics. 2001;10(11):1117–28. Epub 2001/05/24. pmid:11371505.
  56. 56. Veland IR, Montjean R, Eley L, Pedersen LB, Schwab A, Goodship J, et al. Inversin/Nephrocystin-2 is required for fibroblast polarity and directional cell migration. PloS one. 2013;8(4):e60193. Epub 2013/04/18. pmid:23593172; PubMed Central PMCID: PMC3620528.
  57. 57. Werner ME, Ward HH, Phillips CL, Miller C, Gattone VH, Bacallao RL. Inversin modulates the cortical actin network during mitosis. American journal of physiology Cell physiology. 2013;305(1):C36–47. Epub 2013/03/22. pmid:23515530; PubMed Central PMCID: PMC3725518.
  58. 58. Lienkamp S, Ganner A, Walz G. Inversin, Wnt signaling and primary cilia. Differentiation; research in biological diversity. 2012;83(2):S49–55. Epub 2011/12/31. pmid:22206729.
  59. 59. Benzing T, Simons M, Walz G. Wnt signaling in polycystic kidney disease. Journal of the American Society of Nephrology: JASN. 2007;18(5):1389–98. Epub 2007/04/13. pmid:17429050.
  60. 60. Nurnberger J, Kavapurackal R, Zhang SJ, Opazo Saez A, Heusch G, Philipp T, et al. Differential tissue distribution of the Invs gene product inversin. Cell and tissue research. 2006;323(1):147–55. Epub 2005/07/12. pmid:16007506.
  61. 61. Lande-Diner L, Boyault C, Kim JY, Weitz CJ. A positive feedback loop links circadian clock factor CLOCK-BMAL1 to the basic transcriptional machinery. Proceedings of the National Academy of Sciences of the United States of America. 2013;110(40):16021–6. Epub 2013/09/18. pmid:24043798; PubMed Central PMCID: PMC3791755.
  62. 62. Katano-Toki A, Satoh T, Tomaru T, Yoshino S, Ishizuka T, Ishii S, et al. THRAP3 interacts with HELZ2 and plays a novel role in adipocyte differentiation. Molecular endocrinology (Baltimore, Md). 2013;27(5):769–80. Epub 2013/03/26. pmid:23525231.
  63. 63. Lincoln GA, Andersson H, Loudon A. Clock genes in calendar cells as the basis of annual timekeeping in mammals—a unifying hypothesis. The Journal of endocrinology. 2003;179(1):1–13. Epub 2003/10/08. pmid:14529560.
  64. 64. Yoshimura T. Neuroendocrine mechanism of seasonal reproduction in birds and mammals. Animal science journal = Nihon chikusan Gakkaiho. 2010;81(4):403–10. Epub 2010/07/29. pmid:20662808.
  65. 65. Rijksen HD, Wageningen L. A fieldstudy on Sumatran orang utans (Pongo pygmaeus abelii, Lesson 1827): Ecology, behaviour and conservation: H. Veenman Netherlands; 1978.
  66. 66. Gupta VA, Hnia K, Smith LL, Gundry SR, McIntire JE, Shimazu J, et al. Loss of catalytically inactive lipid phosphatase myotubularin-related protein 12 impairs myotubularin stability and promotes centronuclear myopathy in zebrafish. PLoS genetics. 2013;9(6):e1003583. Epub 2013/07/03. pmid:23818870; PubMed Central PMCID: PMC3688503.
  67. 67. Prorocic MM, Wenlong D, Olorunniji FJ, Akopian A, Schloetel JG, Hannigan A, et al. Zinc-finger recombinase activities in vitro. Nucleic acids research. 2011;39(21):9316–28. Epub 2011/08/19. pmid:21849325; PubMed Central PMCID: PMC3241657.
  68. 68. Yang Y, Zou W, Kong X, Wang H, Zong H, Jiang J, et al. Trihydrophobin 1 attenuates androgen signal transduction through promoting androgen receptor degradation. Journal of cellular biochemistry. 2010;109(5):1013–24. Epub 2010/01/14. pmid:20069563.
  69. 69. Stittrich AB, Haftmann C, Sgouroudis E, Kuhl AA, Hegazy AN, Panse I, et al. The microRNA miR-182 is induced by IL-2 and promotes clonal expansion of activated helper T lymphocytes. Nature immunology. 2010;11(11):1057–62. Epub 2010/10/12. pmid:20935646.
  70. 70. Zhong W-d, Zeng G-q, Cai Y-b, Tan Y, Hen S, Dai Q, et al. Pathological changes in seminiferous tubules in infertility rats induced by chemokine-like factor I. Chin J Exp Surg. 2003;20:1027–8.
  71. 71. Liu D, Yin C, Zhang Y, Tian L, Li T, Li D, et al. Human CMTM2/CKLFSF2 enhances the ligand-induced transactivation of the androgen receptor. Chinese Science Bulletin. 2009;54(6):1050–7.
  72. 72. Fujii-Hanamoto H, Matsubayashi K, Nakano M, Kusunoki H, Enomoto T. A comparative study on testicular microstructure and relative sperm production in gorillas, chimpanzees, and orangutans. Am J Primatol. 2011;73(6):570–7. Epub 2011/02/03. pmid:21287585.
  73. 73. Wich SA, Utami-Atmoko SS, Setia TM, Rijksen HD, Schurmann C, van Hooff JA, et al. Life history of wild Sumatran orangutans (Pongo abelii). Journal of human evolution. 2004;47(6):385–98. Epub 2004/11/30. pmid:15566945.
  74. 74. Torgerson DG, Kulathinal RJ, Singh RS. Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Molecular biology and evolution. 2002;19(11):1973–80. Epub 2002/11/02. pmid:12411606.
  75. 75. Swanson WJ, Vacquier VD. The rapid evolution of reproductive proteins. Nature reviews Genetics. 2002;3(2):137–44. Epub 2002/02/12. pmid:11836507.