LA, JD, and HD conceived and designed the experiments. LA performed the experiments. LA and HD analyzed the data. LA, JD, and HD wrote the paper.
The authors have declared that no competing interests exist.
For years evolutionary biologists have been interested in searching for the
genetic bases underlying humanness. Recent efforts at a large or a complete
genomic scale have been conducted to search for positively selected genes in
human and in chimp. However, recently developed methods allowing for a more
sensitive and controlled approach in the detection of positive selection can be
employed. Here, using 13,198 genes, we have deduced the sets of genes involved
in rate acceleration, positive selection, and relaxation of selective
constraints in human, in chimp, and in their ancestral lineage since the
divergence from murids. Significant deviations from the strict molecular clock
were observed in 469 human and in 651 chimp genes. The more stringent
branch-site test of positive selection detected 108 human and 577 chimp
positively selected genes. An important proportion of the positively selected
genes did not show a significant acceleration in rates, and similarly, many of
the accelerated genes did not show significant signals of positive selection.
Functional differentiation of genes under rate acceleration, positive selection,
and relaxation was not statistically significant between human and chimp with
the exception of terms related to
Since the publication of the human and the chimp genomes, one of the major challenges in evolutionary biology has begun to be deciphered: namely, the search for positively selected genes that have shaped humanness. Arbiza and colleagues undertake a genomic-scale search for the genes that have been positively selected in human, in chimp, and in their common ancestral lineage. They conclude that events of positive selection were six times more frequent in chimp than in human, although they do not group under specific functional classes that have been preferentially selected in either species. However, in the comparisons of the evolutionary trends between the ancestral and the descendant lineages, they found that most of the relative differences in common classes show an abundance of positive selection on the human branch. By differentiating positive selection from a relaxation of selective constraints, both producing analogous footprints in the genome, they demonstrate that many of the genes previously thought to have been positively selected correspond to likely cases of relaxation. Finally, they quantify the bias produced by the use of average rate–based approaches to concentrate cases of adaptive evolution in these species.
For years evolutionary biologists have been interested in knowing to what extent
natural selection and genetic drift have shaped the genetic variation of populations
and species [
PS is the process favoring the retention in a population of those mutations that are
beneficial to the reproductive success of individuals. Contrary to this process, the
molecular clock hypothesis [
Recent efforts at a large or genomic scale have been conducted to elucidate the
intricacies of human evolution by means of comparing rate differences and PS against
other fully sequenced species. In a recent work, Dorus et al. [
Indeed, while these three publications have been hallmarks in the genomic-scale
search for events showing PS and have provided much insight into the subject, the
combination of methods used have produced certain disagreements and have left some
important considerations unaccounted for. As noted in the CSAC publication, the set
of 585 genes observed may only be enriched for cases of PS given that, for example,
the Ka/Ki statistic used could be >1 by chance in almost half of these genes
if purifying selection is allowed to act non uniformly [
Finally, it is important to note that likelihood ratio tests like those used here and
in some previous studies are sensitive to model assumptions [
Therefore many important questions regarding the identity and functional roles of genes showing acceleration, RSC, and PS, still remain: which are the genes that can be assigned to these sets with a considerable degree of sensitivity and confidence? Are these genes significantly different between species in functional terms? Do these genes encompass a special group of functional classes, or are they an unbiased representation of the genome? To what extent do the set of positively selected genes (PSG) differ from the set of accelerated genes? How many of the PSG can be distinguished from cases of RSC? Furthermore, can we gain any additional insight by comparing the pattern of adaptation of the derived species against that in their ancestral lineage?
All of these questions can only be answered by testing for deviations from the neutral theory in human, in chimp, and in their common ancestor, independently, using sensitive tests for PS while correcting for multiple testing. In this study, we have searched for the most complete set of known human genes with the chimp, mouse, rat, and dog orthologs available in order to answer all of these questions.
The two branch-site maximum likelihood (ML) tests of PS employed in this paper
benefit from a high degree of sensitivity when compared with previous branch tests,
and can be used together, as has been recently shown [
This is the first comparative genomic study where the lineage-specific events involved in processes of PS and RSC occurring in the human genome before and after the speciation event that differentiated us from our closest living species have been deduced.
The analysis begins with the complete set of 30,709 genes in the Ensembl
Human Database version 30.35c. These were filtered to remove all genes that
had not been confirmed through mapping to Swiss-Prot, RefSeq, or SPTreEMBL,
and a total of 20,469 genes, which in this manner had acquired the Ensembl
known gene status, remained. Inspection of ortholog annotations for this set
of genes in the Ensembl-Compara database (version 30) yielded 14,185 human
genes with ortholog predictions in chimp, mouse, rat, and dog, corresponding
to 69% of the known Ensembl human genome. After filtering the
sequences by length and exceedingly high evolutionary rates, 13,197 genes
were analysed by means of the relative rates test (RRT) (see
A more detailed analysis showed significant deviations in both Ka and Ks
tests for 65 (0.5%) genes, out of which 18 evolved relatively
faster in human than in chimp (HF), and 47 evolved relatively faster in
chimp than in human (ChF). It is important to note that HF and ChF terms
represent relative, rather than absolute, rate definitions. The number of
genes for which there were significant differences, in either only Ka or
only Ks, was higher for chimp (477 and 99) than for human (352 and 83),
respectively. The RRT performed showed that a higher number of genes have
significantly accelerated in nonsynonymous (938) rather than in synonymous
changes (247). The ratio of the number of genes showing an acceleration of
nonsynonymous to synonymous rates was similar and more than threefold
(approximately 3.8) in both species. This bias constitutes an indirect
evidence of the already characterized overdispersed clock in mammals, which
suggests that protein evolution cannot be explained by a simple model theory
of neutral evolution [
RRT Results
The bulk of all genes fall within the category showing the highest rates of
evolution changing by nonsynonymous mutations (
Evolutionary Rates of Human and of Chimp
ML estimations of evolutionary rates in the human branch and in the chimp
branch were calculated using PAML [
Using human Gene Ontology (GO) terms [
Functional Analysis of Genes with Deviations from the Molecular Clock
To find out if there were any over- or under-represented GO terms in between
human and chimp, a Fisher exact test with
In summary, we have not detected GO terms differentially distributed between the significantly accelerated genes of human and of chimp. Moreover, the set of functions accelerated in human does not represent a special subset of genes with functional particularities within the human genome.
The set of genes used for clock testing were also analyzed for signals of PS.
After discarding those with fewer than three unique base pair differences,
9,674 human–chimp–mouse–rat–dog
orthologous sequences remained. This set was then analyzed for signals of PS
with Tests I and II, which can be used to distinguish RSC from true events
of PS when used in conjunction with each other [
Functional Analysis of PSG
Initially, when comparing representations of terms under human and chimp
directly, it is evident that with minor modifications of frequencies H-PSG
have shown almost the same set of biological functions as those in chimp
(Ch-PSG). It is interesting to note that in this comparison the highest
differences in representation of genes between both lineages are found under
terms such as
A more striking difference becomes noticeable when switching from the
perspective of a direct comparison of the functional GO categories under PS
for human and for chimp, to that based on the relative differences observed
between the ancestral lineage and each one of the corresponding derived
species. The H-AH and Ch-AH columns in
It is held that genes showing acceleration in nonsynonymous rate are likely to
concentrate cases of PS. However, the comparison of
A minor proportion of genes with Ka/Ks > 1 match events of PS in
human and in chimp (red circles). Many of the genes with Ka/Ks <
1 show evidence of PS (blue circles). Genes with Ka/Ks > 1
without evidence of PS (black asterisks) fall mostly under molecular
clock conditions for nonsynonymous changes (circles below the broken red
line). Most of the genes without evidence of PS and Ka/Ks < 1
(grey circles) are scattered below the boundary limiting molecular clock
like behavior and are observed at dKa < 0.0006 when molecular
clock conditions are not fulfilled. Genes outside of clock conditions
and dKa > 0.0006 coincide mostly with events of PS in both of the
species (red and blue circles above the broken line). dKa and rKa as
defined in
The total number of genes with Ka/Ks > 1 considered in the analysis of
In a similar manner, when considering differences in Ka rate instead of Ka/Ks rate ratios, 386 human genes (552 in chimp) have experienced a significant acceleration of nonsynonymous rate, and only approximately 32 of these genes (120 in chimp) have shown a reliable signal of PS. However, when considering genes with a significant acceleration in Ka rate and a dKa > 0.0006, most of them show evidence of PS (81% in human and 94% in chimp). Although it is important to remember that they are still a minority out of all of the genes with a significant deviation in Ka-RRT.
In summary, we observe that only those genes with a significant Ka-RRT and dKa > 0.0006 could possibly be considered as candidates for an enriched probability of having been positively selected. These results serve to highlight one of the downfalls of using elevated normalized Ka rates as a means of concentrating likely cases of PS in an a priori fashion.
It is known that most tests of PS are not able to distinguish real events of
positive Darwinian selection from cases of RSC [
(A) The differential distribution of genes along tree branches, suggests a different pattern of occurrence of PS (Test II) and RSC (Test I) in derived and ancestral lineages. Numbers in red represent the total number of genes detected in each test after correcting for multiple testing. Numbers in black are common orthologous genes observed between lineages. Numbers in blue are genes observed in both tests.
(B) The phylogenetic distribution of four representative GO categories is shown in human, in chimp, and in the ancestral lineage as depicted in the tree defined above. Numbers correspond to the percentage representation of genes under PS and RSC for each term out of the total number of genes with GO annotation. Filled circles show significant (red) and nonsignificant (grey) differences in the comparisons (see text for a detailed explanation).
A common pattern observed for all of the functional categories represented in the
set of genes under RSC was the absence of functional differentiation between
human and chimp (grey-filled circles). However, a highly significant increase
(red-filled circles) occurred in the representation of the term
The opposite pattern was observed for the
Differences in GO term representation between the sets of the derived and
the ancestral lineages (H-AH, human versus ancestral lineage; CH-AH,
chimp versus ancestral lineage) are plotted against each other using
genes exclusively observed in Test I (RSC) and Test II (PS). Each
quadrant represents a particular evolutionary scenario increasing or
decreasing in GO representation for each of the lineages after
speciation. Terms showing a difference in representation between H-AH
and CH-AH >10% were labeled in red:
GO terms with positive differences in representation in both axes correspond to
those increasing in both species after the speciation process (Q1). Considering
the adaptive evolutionary process, a total of 26 functional categories fits this
pattern (PS graph). Most of them (21) showed higher differences in
representation in human than in chimp (H-AH%, Ch-AH%),
i.e.,
In summary, although Test II detected a higher number of PSG in chimp than in
human, and GO term representations between them were not significant, the
comparison between ancestral and derived adaptive trends show that out of a
total of 59 common GO terms to all lineages, 41 showed a higher proportion of PS
events occurring in the human lineage. Only 11 terms showed a higher proportion
of PSG in chimp. Additionally, the difference in data distributions between the
sets of RSC/weak signal of PS and that of PS, suggested by
It is worth noting that the fact that many of the genes found exclusively in Test I have functionally important products, such as homeobox- and polymerase-related proteins among others, seems to suggest that it is highly improbable that all of them have undergone a process of RSC. Probably many of them are genes with a weak yet true signal of PS not sufficient to be detected by Test II (R. Nielsen, personal communication). It is evident that further statistical methods are necessary to accurately differentiate weak signals of PS from real cases of RSC.
A Small Sample of the Human and the Chimp Genes Deduced under Tests I and II
Many other genes with a strong signal of PS in human (H), in chimp (Ch), in human
and chimp (H-Ch), and in the ancestral lineage of hominids (AH) were related to:
a) nervous system, H:
These functions are a small sample of those observed in this study and point out the great variety of functions modified by natural selection during hominid evolution.
We present a complete genomic evolutionary analysis of molecular clock, RSC, and PS considering the comparison with the ancestral lineage of hominids in order to differentiate adaptive trends in evolution after the speciation process differentiating human and chimpanzee. Based on testing deviations of neutrality in a gene-by-gene approach, we found a total of 1,182 (9.0%) human and 1,948 (14.8%) chimp genes with statistically significant deviations observed in at least one of the mentioned processes. However, after correcting for multiple testing we only considered 665 (5.0%) human and 1,341 (10.2%) chimp genes as a better estimate of the minimal sets under non-neutral evolution in these species. We conclude that these evolutionary processes do not show signs of being frequent events shaping the pattern of divergence between human and chimp genomes.
Differences in evolutionary rates exist between the species although there were no
net significant differences. The number of genes showing a significant acceleration
in non-synonymous rates exceeds those evolving by synonymous changes, and is greater
for chimp than for humans. This excess of nonsynonymous changes favoring chimp
correlates with the greater number of PS events observed in this species, and could
be due in part to the comparatively smaller population size that has shaped human
evolution [
For years, evolutionary biologists have known that deviations from the molecular clock, or rate acceleration in general, are not necessary, nor sufficient, to infer adaptive processes occurring during evolution of species. We have observed that a consideration of genes with a Ka/Ks > 1 yield a set where only 7%–20 % of genes show evidences of PS. Similarly, using a RRT approach on nonsynonymous mutations, those showing significant deviations are enriched for PS events from 10%–30%. With the addition of a nontrivial divergence value (dKa > 0.0006), the number of genes is reduced considerably, but PS events reach a concentration of 80%–95%. However, in all of these cases a high proportion of PSG are discarded in comparison with the number of PS events found by using the ML branch-site models of Test II used in this study.
A previous genomic study focusing on PS selection in human and in chimp has found
that many functional categories were over- and under-represented in both species
[
The sets of genes deduced without correction for multiple testing in molecular clock
and PS analyses produced similar results for most of the GO representation
comparisons observed after correction. The only exception was the term
Whole-genome analyses of evolutionary properties were made without any a priori
hypothesis about the resulting genes. Consequently, these types of analyses are
exhaustive and, at the same time, conservative regarding individual results. The
necessity of keeping the type I error rate at an acceptable level leads to an
unavoidable increase in the rejection of true positive results [
For years it has been thought that the availability of the chimpanzee genome sequence and its comparison to that of human would reveal some of the molecular bases underlying the observable differences and possibly provide clues to that which makes us human. Now it is evident that neither the methodologies existing nor the detail and quality of the available annotation on the genes have allowed for a conclusive answer. In the future, new methods and more detailed functional annotations will be necessary to properly clarify this relevant biological issue.
Ortholog annotations for the subset of 20,469 “known” Ensembl
human protein-coding genes within the full set (30,709 genes) of the Ensembl version
30.35h
DNA CDS were aligned using ClustalW [
PS was evaluated using two different branch-site model Tests (I and II)
[
In contrast to the statistical behavior of previous branch-site tests [
In all cases, unless otherwise stated, p statistics derived from clock and PS
analysis were false discovery rate–adjusted for multiple testing using the
method of Benjamini and Hochberg [
(173 KB ZIP)
(1.8 MB ZIP)
We thank Rasmus Nielsen for providing unpublished results on the statistical behavior of Tests I and II and valuable comments on this paper.
PSG in the ancestral hominid lineage
coding sequences
chimp faster than human
chimp PSG
Chimpanzee Sequencing and Analysis Consortium
gene ontology
human faster than chimp
human PSG
nonsynonymous rate of evolution in substitutions per site
relative rates test on nonsynonymous sites
Kolmogorov–Smirnov
synonymous rate of evolution in substitutions per site
maximum-likelihood
positive selection
positively selected genes
quadrant
relative rates test
relaxation of selective constraints