Rhesus factor polymorphism has been an evolutionary enigma since its discovery in 1939. Carriers of the rarer allele should be eliminated by selection against Rhesus positive children born to Rhesus negative mothers. Here I used an ecologic regression study to test the hypothesis that Rhesus factor polymorphism is stabilized by heterozygote advantage. The study was performed in 65 countries for which the frequencies of RhD phenotypes and specific disease burden data were available. I performed multiple multivariate covariance analysis with five potential confounding variables: GDP, latitude (distance from the equator), humidity, medical care expenditure per capita and frequencies of smokers. The results showed that the burden associated with many diseases correlated with the frequencies of particular Rhesus genotypes in a country and that the direction of the relation was nearly always the opposite for the frequency of Rhesus negative homozygotes and that of Rhesus positive heterozygotes. On the population level, a Rhesus-negativity-associated burden could be compensated for by the heterozygote advantage, but for Rhesus negative subjects this burden represents a serious problem.
Citation: Flegr J (2016) Heterozygote Advantage Probably Maintains Rhesus Factor Blood Group Polymorphism: Ecological Regression Study. PLoS ONE 11(1): e0147955. https://doi.org/10.1371/journal.pone.0147955
Editor: Calogero Caruso, University of Palermo, ITALY
Received: October 13, 2015; Accepted: January 11, 2016; Published: January 26, 2016
Copyright: © 2016 Jaroslav Flegr. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: Funded by Czech Science Foundation P303/11/1398.
Competing interests: The authors have declared that no competing interests exist.
Polymorphism in the Rhesus factor, namely the existence of a large deletion in the RHD gene  in a substantial fraction of the human population, has been an evolutionary enigma since the discovery of this factor in the 1930’s [2–5]. Before the introduction of prophylactic treatment in 1968, the carriers of the rarer variant of the gene, namely Rhesus negative women in in a population of Rhesus positive subjects or Rhesus positive men in population of Rhesus negative subjects, had lower fitness. This is because RhD-positive children born to pre-immunized RhD-negative mothers were at a higher risk of fetal and newborn death or health impairment from the haemolytic disease. Therefore, mutants or migrants with the rarer variant of the RHD gene could not invade the population and any already existing RhD polymorphism should be unstable.
It has been suggested that this polymorphism can be stabilized when the disadvantage of carriers of the locally rarer allele is counterbalanced by higher viability of their heterozygote children or by another form of frequency-dependent selection . In the past seven years, several studies have demonstrated that Rhesus positive and Rhesus negative subjects differ in resistance to the adverse effects of parasitic infections, aging, fatigue and smoking [7–13]. A recently published cross sectional study performed on a cohort of on 3,130 subjects showed numerous associations between Rh negativity and incidence of many disorders. In this study, one hundred fifty four (154) of 225 diseases/disorders were reported by at least 10 subjects. Within this subset, 31 significant associations with RhD negativity (21 positive and 10 negative) were observed . A study performed on 250 blood donors has further shown that resistance to the effects of toxoplasmosis is higher in Rhesus positive heterozygotes than in Rhesus positive homozygotes and substantially higher than in Rhesus negative homozogotes . This is the first direct evidence for the role of selection in favour of heterozygotes in stabilization of the RHD gene polymorphism in human populations. Such a mechanism is reminiscent of widely known situations with polymorphism in genes associated with sickle cell anaemia in geographic regions with endemic malaria .
RhD protein is a component of a membrane complex of which the function is not quite clear. It is most probably involved in NH3 transport and possibly also in CO2 transport [16,17]. The complex is associated with spectrin-based cytoskeleton and therefore plays an important role in maintaining the typical shape (biconcave discoid) of human erythrocytes . The “European” RhD- variant of the RHD gene carries a deletion covering the entire protein-coding part of the gene . Therefore, no product of this allele is synthetized in the cells of RhD negative homozygotes and the RhD is most probably substituted in the corresponding molecular complex by the related protein RhCE. Therefore, erythrocytes of RhD- and RhD+ homozygotes differ in molecular complexes on their cell membranes and probably also in their biological activities. An important difference was also observed between erythrocytes of RhD positive homozygotes and heterozygotes. About 33,560 and 17,720 D antigen sites were detected on the surfaces of an erythrocyte in RhD homozygotes and heterozygotes, respectively . This suggests that the susceptibility of RhD positive homozygotes and RhD positive heterozygotes (and even more so RhD negative homozygotes) to various aberrant conditions, including various diseases, could differ dramatically. Due to the general trade-off principle, heterozygotes could be more resistant to one disease and more prone to another disease while the opposite could be true for homozygotes. Such trade-offs could explain the heterozygote advantage hypothesis and all other observed phenomena.
The frequencies of Rhesus negative subjects (and therefore also Rhesus positive heterozygotes) as well as the incidences of particular diseases and disorders vary between countries. If the protective effect of Rhesus positivity or Rhesus heterozygosity is strong enough, then the relationship between the frequencies of Rhesus negative homozygotes (and Rhesus positive heterozygotes) should correlate with the incidences of specific diseases when important confounding variables are controlled. This could be either because the incidences of particular disease influence the geographic distribution of RhD alleles, or because the differences in prevalence of particular phenotypes influence the incidences of particular diseases. Here, I have studied the correlation of disease burden estimates compiled by the WHO with the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes in a set of 65 countries for which the data on the frequencies of Rhesus-negative individuals are available.
The specific aims of present study (the hypothesis to be tested] are:
- Hypothesis 1: The frequency of Rh negative homozygotes in particular countries correlates (mostly positively) with the incidence of some health disorder in these countries
- Hypothesis 2: The frequency of Rh positive heterozygotes in particular countries correlates (mostly negatively) with the incidence of some health disorder in these countries
- Hypothesis 3:. The relation of the incidence of health disorders with frequency of Rh negative and Rh positive heterozygotes are mostly in opposite directions.
Materials and Methods
Sources of Data
The data on disease burdens were obtained from the table “Mortality and Burden of Diseases Estimates for WHO Member States in 2004,” which was published by the WHO  and are available at: www.who.int/evidence/bod. Because of the expected effects of the frequencies of the Rhesus factor genotypes on the age structures of the populations, the age-non-standardized disease burden data were used. The frequencies of Rhesus negative homozygotes in particular countries  were taken from the internet compilation by RhesusNegative.net available at http://www.Rhesusnegative.net/themission/bloodtypefrequencies/ from December, 22nd of 2013 and from the monograph of Mourant . The frequencies of Rhesus positive heterozygotes were calculated from the frequencies of Rhesus negative homozygotes using the Hardy-Weinberg equation. The geographical latitude and the annual mean of relative humidity for particular countries were derived using the data available at http://data.worldbank.org/indicator/NY.GDP.PCAP.CD (accessed December, 10th of 2013) and http://www.climatemps.com/ (accessed April, 2nd of 2013). The medical care expenditure expenses per capita were obtained from http://data.worldbank.org/indicator/SH.XPD.PCAP (accessed March, 2nd of 2014). The frequencies of smokers in the populations were calculated as the arithmetic mean from men and women; data available at http://www.who.int/tobacco/mpower/mpower_report_prevalence_data_2008.pdf (accessed February, 21st of 2014). All data used in this study are available as S1 Data.
Factor analyses and a general linear model (GLM) analysis of the obtained factors were performed with Statistica v. 8.0 and all other tests with IBM SPSS v. 21. A factor analysis of the specific disease burden data (principal components method, raw Varimex rotation) was used for data reduction. When the factors were extracted from the mortality rate variables, the diseases with not enough data were excluded and the missing data were substituted with means. The Disability Adjusted Life Year (DALY) data for nearly all diseases except trypanosomiasis, Chagas disease, and onchocerciasis were available for all countries in my data set. All factors with eigenvalues > 1.0 were extracted. Type III sum of squares and models including the intercept were used in the GLM (multivariate and multiple-multivariate) analyses; for details, see .
Association between Disease Burden and Frequencies of RhD Genotypes
Mortality and morbidity data for more than one hundred diseases are available in the WHO database . A search for associations between Rhesus genotype and particular diseases would necessarily result in many spurious associations. Therefore, factor analysis was initially used for data reduction. Performing this analysis on 125 diseases and burden of disease categories yielded 23 factors with eigenvalues>1.0, together explaining 85% of the variability in Disability Adjusted Life Year (DALY) among 192 WHO member countries. The DALY has been defined  as “a health gap measure that extends the concept of potential years of life lost due to premature death and also to include equivalent years of ‘healthy’ life lost by virtue of being in a state of poor health or disability”. Next, a multiple multivariate analysis was performed with these 23 factors as dependent variables and frequencies of Rhesus negative homozygotes, Rhesus positive heterozygotes and also five potential confounding variables: GDP, latitude (distance from the equator), humidity, medical care expenditure per capita and frequencies of smokers in the population for 65 countries with the RhD genotype frequency data as independent variables. The effects of the frequency of RhD heterozygotes (μ2 = 0.67, P = 0.013), smokers (μ2 = 0.75, P = 0.001), latitude (μ2 = 0.71, P = 0.005), and humidity (μ2 = 0.64, P = 0.027) but not frequency of RhD negative homozygotes (μ2 = 0.60, P = 0.070), GDP (μ2 = 0.42, P = 0.635) and medical care expenditure (μ2 = 0.23, P = 0.991) were significant. This model explained considerable parts of variability in factors 1–3, 6–8, 11, and 22 (adjusted R2 = 0.372, 0.771, 0.497, 0.461, 0.197, 0.237, 0.224, and 0.296, respectively). Post hoc simple multivariate analyses showed that the frequency of Rhesus negative homozygotes correlated with five of 23 factors that together explained 17.7% of variability between different countries in DALY (Table 1). Similarly, the frequency of RhD positive heterozygotes significantly correlated with six factors that together explained 13.9% of variability between different countries in DALY. Three correlations of 23 factors with frequency of RhD positive homozygotes and one with RhD negative homozygotes remained significant even after the correction for multiple tests . Moreover, the expected number of false significant results for 23 statistical tests was not 5.5 but 1.2, suggesting that most of the positive results were not simply due to multiple comparisons. In accord with the heterozygote advantage hypothesis, the regression coefficients (B-values) for Rhesus negative homozygotes and Rhesus positive heterozygotes went toward opposite directions whenever any of these correlations was significant.
Correlation between Particular Disease Burdens and Frequencies of RhD Genotypes
In the exploratory part of the study, univariate correlations analyses showed many strong associations between disease burdens measured with DALY or with incidences of deaths per population of 100,000 and the frequencies of Rhesus negative subjects or Rhesus positive heterozygotes (Fig 1). However, specific disease burdens also strongly correlated with some of the confounding variables, most often with latitude, smoking and humidity. These covariates expressed either highly significant correlation with frequency of Rhesus negative subjects and Rhesus positive heterozygotes (GDP: R = 0.64, P < 0.000001, latitude: R = 0.65, P < 0.000001, medical care expenditure per capita: R = 0.68, P < 0.000001, frequencies of smokers in the population: R = 0.34, P < 0.001; Spearman correlation), or a trend (humidity: R = -0.19, P = 0.132). Therefore, the GLM analysis was used to search for the association between Rhesus genotypes and disease burden. This was accomplished by using the frequency of Rhesus negative homozygotes and Rhesus positive heterozygotes and GDP, latitude, humidity, medical care expenses per capita and frequencies of smokers within the populations of 65 countries possessing the RhD genotype frequency data. Tables supplied by the WHO contained information regarding 121 diseases and disease categories for this subset of countries. The results presented in the Table 2 show that the frequencies of Rhesus negative subjects correlated with DALY for 21 of 121 diseases and disease categories (twelve positively and nine negatively). The frequency of Rhesus positive heterozygotes correlated with DALY for 25 of 121 diseases and disease categories (eleven positively and fourteen negatively). Similarly, the frequencies of Rhesus negative subjects correlated (all positively) with the mortality rates (incidences of deaths per population of 100,000) for 10 of 97 diseases and disease categories with mortality rate data available. The frequencies of Rhesus positive heterozygotes correlated with the mortality rates for 8 of 97 diseases and disease categories (two positively and six negatively). The expected number of false significant results for 436 statistical tests was not 65 but 22, however, due to the complicated network of correlation between incidence (or morbidity) of particular diseases (which did not exist between non-correlated factors obtained by the factor analysis) it was not possible to perform an unbiased formal correction for multiple tests (Garcia 2004) .
The x and y axes show the frequency of Rhesus negative homozygotes in the population of a country and mortality (numbers of deaths per population of 100,000), respectively. The figures represent the results of the Pearson correlation analysis, namely the level of significance (P) and the coefficient of determination, i.e. the fraction of a country’s variability of the specific mortality rate that can be explained by the differences within the frequencies of Rhesus negative subjects.
The results of the ecological regression analysis were in a perfect agreement with all three a priori hypotheses. They showed that both the frequencies of Rhesus negative homozygotes as well as those of the Rhesus positive heterozygotes correlated (mostly in the opposite direction) with specific disease burdens. The general pattern was that the countries with a high frequency of Rhesus negative homozygotes had lower congenital-anomalies-associated burden and neuropsychiatric condition-associated burden (except Alzheimer’s and Parkinson’s Disease burden) as well as higher cardiovascular and especially malignant neoplasm-associated burden. Noticeably, the only form of cancer expressing a clear opposite trend was cervix uteri cancer, i.e. cancer of viral origin.
Many (but not all) of the disorders observed to be affected by RhD phenotype, such as cardiovascular diseases, lung cancer, liver cancer, asthma, could be considered as “modern” diseases. It is therefore questionable whether they could shape the geographical distribution of RhD allele in the past. However, the RhD minus allele (the deletion) has probably spread from one geographic location in western Europe also relatively recently, definitively after the colonization of Europe by modern Homo sapiens sapiens. It is highly probable that in this time people suffered from liver cancer induced by mycotoxins, from lung cancer, from asthma induced by smoke, and after the loss of the skin melanin also from skin cancer in a similar rate as today’s humans.
The results of the current study agree with observations of worse health status of Rhesus negative subjects reported by earlier case control studies [12,13] or observed in the large cohort study performed on a population of 3,130 subjects . However, the results of the ecological regression and the case-control or cohort studies are difficult to compare. For example, in the case-control and cohort studies, but not the ecologic regression study, the effect of RhD negativity on the health status is seemingly increased by an opposite effect of RhD heterozygosity on RhD positive controls consisting of both RhD positive heterozygotes and homozygotes. Moreover, a positive correlation between the focal disease burden and the frequency of a particular RhD genotype observed in an ecological regression study could either be due to an increased sensitivity of Rhesus negative individuals toward the focal disease or a relatively higher resistance or tolerance of these subjects toward other diseases. For example, the protective effect of RhD negativity against many neuropsychiatric disorders observed in the present ecological study could be caused by the fact that RhD negative subjects usually die at an earlier age due to their higher susceptibility to cardiovascular diseases.
It is not clear whether the presence of RhD minus allele alone, or the presence of other alleles in a strong genetic linkage with this allele is responsible for the observed protective effect of heterozygosity. For example, in the Czech population about 95% of subjects with D+C+c-E-e+ phenotype are RhD+ homozygotes and only 4.5 RhD+ heterozygotes (Daniels 2002). Alternative models for explaining observed associations between RhD phenotype and specific disease burdens based on genetic disequilibrium and gene flow (human history) should be tested in future studies.
Limitations of present study: Onset of disorders associated negatively with the frequency of RhD positive heterozygotes, e.g. liver and lung cancer, is rather high in modern human. It is not clear whether decrease of incidence of such disorders could increase fitness of carriers of this phenotype strongly enough. It must be noted, however, that the results of a previous more sensitive case control study performed on a mostly young population indicate that in addition the incidence of many early-age onset disorders, such as diarrhea, thyroiditis, anemia, panic disorder, and scoliosis, was lower in RhD positive subjects.
The interpretations of ecological regression studies are sometimes complicated, especially if aggregated data are used for the estimation of the strength and direction of the influence of particular factors within a population [24,25]. Therefore, one must be very careful with the interpretation of the observed associations between a particular disease burden and the frequency of Rhesus negative subjects within a population. For example, a positive correlation could be due to an increased sensitivity of Rhesus negative individuals to the focal disease or just to higher resistance or tolerance of these subjects to other diseases. Therefore, future studies of the mechanisms of effects of the Rhesus phenotype on risks of particular diseases must be grounded within individual based (case-control and cohort) studies. However, the objective of the present ecological study was to test the heterozygote advantage hypothesis of maintaining the genetic polymorphism in the RHD gene, resulting in polymorphism of the Rhesus factor phenotypes. Based on this hypothesis, I suggested that the frequencies of Rhesus negative homozygotes as well as Rhesus positive heterozygotes should correlate with certain disease burdens and the direction of such correlation will be the opposite for the two phenotypes. The results of present regression study have confirmed these two predictions.
The frequencies of Rhesus positive heterozygotes were calculated from the frequencies of Rhesus negative homozygotes using the Hardy-Weinberg equation. If the conclusions of the present study are correct, then these theoretical frequencies are influenced by a selection against RhD negative homozygotes and in favor of RhD heterozygotes in middle-age and especially in high-age strata. However, H-W equilibrium is being re-established in every generation and the differences between theoretical and real frequencies for reproductive age-strata are probably relatively low, especially in developed countries.
The study confirmed all three a priori hypotheses: The frequency of Rh negative homozygotes in particular countries correlates (mostly positively) with the incidence of some health disorder in these countries 2) The frequency of Rh positive heterozygotes in particular countries correlates (mostly negatively) with the incidence of some health disorder in these countries 3) The direction of the relation of incidence of the health disorder with frequency of Rh negative homozygotes and with Rh positive heterozygotes are mostly (actually always) the opposite.
Some of the associations observed in the study were relatively strong. For example, the slopes values (B) of 1,812 and -463 for cardiovascular diseases suggests that a 1% increase within the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes would result in 1,812 more and 463 less cardiovascular disease associated deaths per 100,000 inhabitants, respectively. A positive nonlinear correlation between the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes exists in the population equilibrium. Therefore, an increase of the frequency of Rhesus negative homozygotes is always accompanied by an increase of the frequency of heterozygotes within a population. The disadvantage of an increased frequency of Rhesus negative homozygotes in a population is therefore usually at least partly compensated for through an increased frequency of heterozygotes. However, from the point of view of human medicine and especially that of an RhD negative individual, the increased risk of a particular disease associated with one genotype is not compensated for through the decreased risk of a disease in individuals with another genotype.
From the point of view of basic science, the most important merit of this study is its robust support of the heterozygote advantage hypothesis. The results suggest that the Rhesus factor polymorphism is maintained in human populations due to a higher resistance or tolerance of heterozygotes against specific diseases. It could be speculated to what extent the highly uneven distributions of RHD minus alleles in world populations might be the result of a founder event and a gene flow  and to what extent it is also modulated by specific selection pressures caused by differences in the geographical distribution of a disease or diseases.
S1 Data. Data file containing frequencies if RhD genotypes, specific disease burdens and confounding variables for 65 countries.
I would like to thank Marek Maly from Department of Statistics, National Health Institute, Prague for his help with the statistics and Mike Dammann and Charlie Nichols for their help with writing the manuscript.
Conceived and designed the experiments: JF. Performed the experiments: JF. Analyzed the data: JF. Contributed reagents/materials/analysis tools: JF. Wrote the paper: JF. Data compilation: JF.
- 1. Wagner FF, Flegel WA (2000) RHD gene deletion occurred in the Rhesus box. Blood 95: 3662–3668. pmid:10845894
- 2. Haldane JBS (1942) Selection against heterozygosis in Man. Eugenics 11: 333–340.
- 3. Hogben L (1943) Mutation and the Rhesus reaction. Nature 152: 721–722.
- 4. Fisher RA, Race RR, Taylor GL (1944) Mutation and the rhesus reaction. Nature 153: 106–106.
- 5. Li CC (1953) Is the Rh facing a crossroad? A critique of the compensation effect. American Naturalist 87: 257–261.
- 6. Feldman MW, Nabholz M, Bodmer WF (1969) Evolution of the Rh polymorphism: A model for the interaction of incompatibility, reproductive compensation and heterozygote advantage AmJHumGenet 21: 171–193.
- 7. Novotná M, Havlíček J, Smith AP, Kolbeková P, Skallová A, et al. (2008) Toxoplasma and reaction time: Role of toxoplasmosis in the origin, preservation and geographical distribution of Rh blood group polymorphism. Parasitology 135: 1253–1261. pmid:18752708
- 8. Flegr J, Novotná M, Lindová J, Havlíček J (2008) Neurophysiological effect of the Rh factor. Protective role of the RhD molecule against Toxoplasma-induced impairment of reaction times in women. Neuroendocrinology Letters 29: 475–481. pmid:18766148
- 9. Flegr J, Klose J, Novotná M, Berenreitterová M, Havlíček J (2009) Increased incidence of traffic accidents in Toxoplasma-infected military drivers and protective effect RhD molecule revealed by a large-scale prospective cohort study. BMC Infectious Diseases 9: art. 72. pmid:19470165
- 10. Flegr J, Novotná M, Fialová A, Kolbeková P, Gašová Z (2010) The influence of RhD phenotype on toxoplasmosis- and age-associated changes in personality profile of blood donors. Folia Parasitologica 57: 143–150. pmid:20608477
- 11. Flegr J, Preiss M, Klose J (2013) Toxoplasmosis-associated difference in intelligence and personality in men depends on their Rhesus blood group but not ABO blood group. PLoS ONE 8.
- 12. Kaňková Š, Šulc J, Flegr J (2010) Increased pregnancy weight gain in women with latent toxoplasmosis and RhD-positivity protection against this effect. Parasitology 137: 1773–1779. pmid:20602855
- 13. Flegr J, Geryk J, Volny J, Klose J, Cernochova D (2012) Rhesus factor modulation of effects of smoking and age on psychomotor performance, intelligence, personality profile, and health in Czech soldiers. PLoS ONE 7: e49478. pmid:23209579
- 14. Flegr J, Hoffmann R, Dammann M (2015) Worse health status and higher incidence of health disorders in Rhesus negative subjects. PLoS ONE 10(10): e0141362. pmid:26495842
- 15. Allison AC (1954) The distribution of the sickle-cell trait in East Africa and elsewhere, and its apparent relationship to the incidence of subtertian malaria- Protection afforded by sickle-cell trait against subtertian malareal infection. Trans R Soc Trop Med Hyg 48: 312–318. pmid:13187561
- 16. Kustu S, Inwood W (2006) Biological gas channels for NH3 and CO2: evidence that Rh (rhesus) proteins are CO2 channels. Transfusion Clinique et Biologique 13: 103–110. pmid:16563833
- 17. Flegel WA (2011) Molecular genetics and clinical applications for RH. Transfusion and Apheresis Science 44: 81–91. pmid:21277262
- 18. Le Van Kim C, Colin Y, Cartron JP (2006) Rh proteins: Key structural and functional components of the red cell membrane. Blood Reviews 20: 93–110. pmid:15961204
- 19. WHO (2008) The Global Burden of Disease: 2004 update. Geneva: World Health Organization.
- 20. Flegr J, Dama M (2014) Does prevalence of latent toxoplasmosis correlate with nation-wide rate of traffic accidents? Folia Parasitologica 6: 485–494.
- 21. Mourant AE (1954) The distribution of the human blood groups. Oxford: Blackwell Scientific Publication.
- 22. Flegr J, Prandota J, Sovickova M, Israili ZH (2014) Toxoplasmosis—A global threat. Correlation of latent toxoplasmosis with specific disease burden in a set of 88 countries. PLoS ONE 9.
- 23. Garcia LV (2004) Escaping the Bonferroni iron claw in ecological studies. Oikos 105: 657–663.
- 24. Wakefield J, Salway R (2001) A statistical framework for ecological and aggregate studies. Journal of the Royal Statistical Society: Series A (Statistics in Society) 164: 119–137.
- 25. Guthrie KA, Sheppard L (2001) Overcoming biases and misconceptions in ecological studies. Journal of the Royal Statistical Society: Series A (Statistics in Society) 164: 141–154.
- 26. Anstee DJ (2010) The relationship between blood groups and disease. Blood 115: 4635–4643. pmid:20308598