Intragenus (Homo) variation in a chemokine receptor gene (CCR5)

Humans have a comparatively higher rate of more polymorphisms in regulatory regions of the primate CCR5 gene, an immune system gene with both general and specific functions. This has been interpreted as allowing flexibility and diversity of gene expression in response to varying disease loads. A broad expression repertoire is useful to humans—the only globally distributed primate—due to our unique adaptive pattern that increased pathogen exposure and disease loads (e.g., sedentism, subsistence practices). The main objective of the study was to determine if the previously observed human pattern of increased variation extended to other members of our genus, Homo. The data for this study are mined from the published genomes of extinct hominins (four Neandertals and two Denisovans), an ancient human (Ust’-Ishim), and modern humans (1000 Genomes). An average of 15 polymorphisms per individual were found in human populations (with a total of 262 polymorphisms). There were 94 polymorphisms identified across extinct Homo (an average of 13 per individual) with 41 previously observed in modern humans and 53 novel polymorphisms (32 in Denisova and 21 in Neandertal). Neither the frequency nor distribution of polymorphisms across gene regions exhibit significant differences within the genus Homo. Thus, humans are not unique with regards to the increased frequency of regulatory polymorphisms and the evolution of variation patterns across CCR5 gene appears to have originated within the genus. A broader evolutionary perspective on regulatory flexibility may be that it provided an advantage during the transition to confrontational foraging (and later hunting) that altered human-environment interaction as well as during migration to Eurasia and encounters with novel pathogens.

Located on Chromosome 3 (3p21), human CCR5 is 6,065 bases long with an ORF of 1,056 bases that codes for a protein with 352 residues. Two common transcripts (B with three exons and the more stable A with four) likely resulted from non-coding upstream polymorphisms in two separate gene promotors (the functionally weaker cis-acting promoter (P U ) upstream of Exon 1 and the downstream promoter (P D ) upstream of Exon 3 [19,20]. These transcripts cause alternate splicing (differential inclusion or exclusion of exons) in messenger RNA that affects regulation of cell surface receptor expression levels [20][21][22].
The plasticity in regulation of gene expression via alternate transcripts and increased polymorphisms [20-22] makes human CCR5 particularly interesting from a broader evolutionary perspective. Homo has one of the broadest adaptive ranges of any species [43] and human CCR5's ability to rapidly respond to new pathogens [19,20] may have served an adaptive function during evolutionary migration and shifts in human-environment relationships with changes to subsistence. Our genomes carry vast evidence of past disease responses [44][45][46] that are shared across the genus and reflect a unique disease pattern for Homo. For example, there is strong evidence for increased disease risk via genetic load in extinct Homo and past human populations [47] and archaeological evidence for past disease treatment (ingestion of anti-biotic and anti-fungal non-food plants) in Neandertals [48][49][50][51][52]. CCR5 has been well studied due to its role in HIV infection (with a focus on natural selection acting on the 32bp deletion) but no work has explored variation within the genus Homo more broadly.
The plethora of research on the evolution CCR5 was conducted prior to the generation of deep coverage, high quality paleogenomes for extinct hominins, such as Neandertal species and the newer Denisova species. While paleogenomic sample sizes are not robust to make statements on selection or add to a discussion of other evolutionary forces acting on variation, they provide an evolutionary dimension to understanding the patterns of variation characterizing our genus and insights into possible adaptations to new environments, subsistence regimes, and pathogens [53]. Plus, the sample of ancient genome is increasing every year. Just a few insights gained from a single paleogenomes include ground-breaking studies on evolution of skin color in humans [54]and Neandertals [55] and the introgression of functionally adaptive polymorphisms into the human immune system genes from Altai Neandertal [56]. Understanding the differences between derived and specific variation also enables potential differentiation of challenges we overcame as a genus such as obligate bipedalism [57] or highaltitude adaptation [58] versus challenges we overcame as a species such as the biocultural evolution of sickle-cell trait and malaria infection [59]. Thus, the overall aim of this research is to place humans within the context Homo and examine if the pattern of humans having a significantly higher number of cis-regulatory region polymorphisms (compared to 36 non-human primate species of apes, Old World Monkeys, and New World Monkeys) [20] is specific or one that is shared by our genus.
Variation in CCR5 was examined in humans and extinct hominins to address the questions: are there shared patterns of variation across the genus for polymorphism frequency and is the distribution of polymorphisms across the gene suggestive of a common evolutionary trajectory? Based on previous studies on human-nonhuman primate gene structure and variation and the finding that human polymorphisms allow flexible CCR5 gene expression [19,20], the expectation is that there is a shared pattern of variation that aided adaptation for members of the geographically and ecological dispersed genus Homo. Both expectations were met.

Materials
Modern human data is from the 1000 Genomes Project [60], which contains data for 2,504 individuals from 26 populations ( Table 1). While coverage is low per individual, the data are robust enough to identify the majority of polymorphisms at a frequency of at least 1% in the populations studied, which is suitable for the current study. Extinct Homo data are: Denisova 3 [61], Denisova 2 [62], Vindija 33.19 Neandertal [63], Altai Neandertal [61], El Sidron Neandertal [64], Mezmaiskaya 1 Neandertal [63], and an ancient human that contributed no genes to modern populations, Ust'-Ishim [65]-see Table 1 for accession numbers. These species are Pleistocene Eurasian hominins with Denisova representing an eastern Eurasian Pleistocene population and Neandertal a western one (with some overlap with Denisova in Siberia). All genomes have high coverage (excepting Mezmaiskaya and Sidron); contamination with modern human DNA is estimated to be less than 1% for the extinct hominins [61-63, 66, 67].

Methods
The human reference sequences for two common transcript variants for the CCR5 gene (NM_000579 and NM_001100168) were downloaded from the National Center for Biotechnology Information (NCBI). The modern human variation data (CCR5 and cis-acting elements) were downloaded from 1000 Genomes via ftp as variant call format (VCF) files (ftp://ftp. 1000genomes.ebi.ac.uk/vol1/ftp/). All data were downloaded to and analyzed using the University of Alaska Research Computing Systems. All files were aligned to the human genome GRCh37/ hg19. VCF files for six extinct hominin species and one extinct human (Ust'-Ishim) were downloaded from the Max Planck Institute Leipzig. Ancient DNA often contains C-to-T deaminations at the end of reads [68]. The lack of variation identified from paleogenomic sequence reads is unlikely to be a result of typical problems associated with ancient DNA sequence reads since chemical processes like deamination would increase SNPs (whether false or not). More significantly, the paleogenomes were generated using protocols that largely eliminates this error [69]. Despite high levels of variation at this locus and evidence for balancing selection in humans at this locus, strong levels of introgression from inter-breeding with Neandertals in Eurasia have not been reported at this locus, as they have for other immune system loci in similar scenarios [27, [70][71][72][73][74][75]; introgression data from the Reich lab (https://reich.hms.harvard.edu/datasets) [76] confirm this is the case (S1 Table). Moreover, the African genetic variation is similar to European genetic variation which suggests that diversity was already present in modern humans prior to any admixture with archaic species in Europe.
Distribution of polymorphisms was guided by the structure provided by Mummidi et al.
[19] and included promoter regions (P U and P D ), ORF, and CCR5. The target area for P U was the most inclusive range (-1976 to +33) which avoided overlap with P D and because little difference was noted between putative P U regions studied by Mummidi et al [19]. The target area for P D was the most productive range (+119 to +828). Significant difference in distribution of polymorphisms across gene structure for all samples was tested using Monte Carlo methods for the exact test.

Results and discussion
Previous research has examined gene structure [19], gene variation within primate species [20], and selection acting on the gene primarily in response to viral load [6,10,21,22,33,36,37,40,41,77,78]. The goal of this research was to establish if the pattern of human variation and distribution of polymorphisms in CCR5 [20] is specific (i.e., unique in the human species) or genus-wide (i.e., a pattern shared by Homo).  (Tables 2 and S2 contains all 1000 Genomes variants). SNP frequency per individual (total SNPs in a population/total number of individuals) within the 26 populations ranged from 14 to 24, with East Asians exhibiting the highest variation and Africa and the Americas the least (Table 1). There were 94 polymorphisms identified across all extinct Homo samples (Altai, Denisova 3 and 2, Mezmaiskaya, Ust'-Ishim, Vindija, and El Sid), an average of 13 per individual included in analysis (Tables 1  and 2 and S1). No polymorphisms were found in the El Sidron specimen and, as a result, it is not included in the tables. Some polymorphisms in extinct Homo (n = 41) have been previously observed in modern humans ( Table 2). There were 53 novel polymorphisms identified, 32 in Denisova (1 in Denisova 3, 31 in Denisova 2) and 21 in Mezmaiskaya. Table 2 summarizes extinct Homo polymorphisms.
Is the distribution of polymorphisms across the gene suggestive of a common evolutionary trajectory? The frequency of polymorphisms across gene structure are used rather than counts because the human sample is much larger and captures an exponentially greater number of polymorphisms as a result (see S3 Table for raw count summary). Both humans and extinct members of our genus exhibit more polymorphisms in gene regulatory regions (Table 3) suggesting a shared pattern of variation across Homo. When polymorphisms occur in both P U and P D , there is a greater frequency in the functionally stronger regulatory area, P D , but in four ancient samples (Altai, Denisova 3, Vindija, and Ust-Ishim), they only occur in P U (see S3  Table); only Denisova 2 and Mezmaiskaya had no polymorphisms in the ORF. The comparatively lower frequency across all samples reflects the conservation trend noted in primates [19,20]. A structural analysis of the distribution of polymorphisms via an Exact Test indicated no significant statistical differences among all samples (results not shown). Given the expected frequency of polymorphism (based on the perception of CCR5 covered by an area of interest -see Table 3 footnotes), there is a significant pattern in the samples. First, modern humans and Denisova 2 have a greater than expected number of polymorphisms in the ORF (even if these are exceeded by polymorphisms in regulatory regions). All samples (except Denisova 2) have a greater than expected number of polymorphisms in the promoter regions.
Prior research found that humans have a potentially unique plasticity in gene expression due to the effect of alternate splicing [57]. The distribution of polymorphisms across gene regions in Homo suggests plasticity in gene regulation and expression in response to viral loads, as noted in previous studies [19,20]. The pattern of immune gene introgression, particularly regulatory haplotypes in the antiviral OAS gene cluster [70], has suggested that selective forces in our close relatives operated on expression, not protein variation-same as seen in non-human primate CCR5 variation [19, 20]-and those adaptations were also useful to humans. Thus, an increase in polymorphisms that allowed plasticity in regulation and expression in CCR5 makes sense even if it is not due to introgression. Without functional testing, the exact nature of the polymorphisms is not known other than by inference and comparative analysis, as done here. And, without more paleo-genomes to compare, we cannot know if the variation in these genomes represents true species variation but the data presented here indicate that the pattern is not human specific, rather one shared by recent members of Homo.
The expectation that extinct hominins and modern humans would share this pattern of increased variation in the regulatory areas of the gene is met in the current study. Our genus has several unique behavioral and genetic adaptations compared to nonhuman primates and these adaptations might hold some avenues for further research. For instance, a genus-wide shift in subsistence activities occurred during the Plio-Pleistocene (roughly 2 million years ago) from opportunistic non-confrontational scavenging to confrontational scavenging and, later, top predatory behaviors; this alteration to hominin-environment interaction brought hominins into greater and regular contact with animal carcasses [79][80][81][82][83][84]. Neandertals in Europe have also been shown to be active hunters and foragers [85] who experienced increased  [87]; high latitude pathogens year-round are more likely parasitic due to the reliance on marine mammals [88] and the short season for viral vector reproductive cycles to transmit infection from insects to hominins [35]. Evidence for gene introgression from extinct hominin species to modern humans is clustered (among other domains) in immune system genes [27,72,76,89]; in particular, the OAS anti-viral gene cluster on Chromosome 12 shows signatures of positive selection [72,76], which suggests that adaptation to Eurasian pathogens may have been partly facilitated by prior adaptative mutations to local viral loads. At a minimum, the environmental challenge faced by non-human members of Homo facilitated human adaptation to a new environment-a shared challenge with a similar solution. While previous studies have examined variation in CCR5, particularly CCR5Δ32 which has a more recent origin [36][37][38][39][40][41][42], as a product of more recent human-disease interaction, the widespread pattern of increased variation in the gene across the genus Homo identified in this study suggests a potential evolutionary adaptation. A key event distinguishing members of the genus Homo from the last common ancestor with Australopithecus was the shift to confrontational scavenging and, later, hunting; this alteration to human-environment interaction added a new point of disease contact as evidenced by modern data showing hunting bushmeat (which ancient hominins did too [83]) alters disease exposure via introduction of retroviruses and other pathogens [90][91][92][93][94]. Given CCR5's role in both innate and adaptive immune system functioning, its plasticity may have provided an advantage to members of Homo across these varied disease ecologies and its potentially greater than normal interaction with the environment in foraging and hunting activities. As more ancient genomes become sequenced, we can have more robust data with which to work and invest resources into functional testing and experimentation of what function these polymorphisms might have had.
Supporting information S1
Project administration: Kara C. Hoover.