Heterozygote Advantage Probably Maintains Rhesus Factor Blood Group Polymorphism: Ecological Regression Study

Rhesus factor polymorphism has been an evolutionary enigma since its discovery in 1939. Carriers of the rarer allele should be eliminated by selection against Rhesus positive children born to Rhesus negative mothers. Here I used an ecologic regression study to test the hypothesis that Rhesus factor polymorphism is stabilized by heterozygote advantage. The study was performed in 65 countries for which the frequencies of RhD phenotypes and specific disease burden data were available. I performed multiple multivariate covariance analysis with five potential confounding variables: GDP, latitude (distance from the equator), humidity, medical care expenditure per capita and frequencies of smokers. The results showed that the burden associated with many diseases correlated with the frequencies of particular Rhesus genotypes in a country and that the direction of the relation was nearly always the opposite for the frequency of Rhesus negative homozygotes and that of Rhesus positive heterozygotes. On the population level, a Rhesus-negativity-associated burden could be compensated for by the heterozygote advantage, but for Rhesus negative subjects this burden represents a serious problem.


Introduction
Polymorphism in the Rhesus factor, namely the existence of a large deletion in the RHD gene [1] in a substantial fraction of the human population, has been an evolutionary enigma since the discovery of this factor in the 1930's [2][3][4][5]. Before the introduction of prophylactic treatment in 1968, the carriers of the rarer variant of the gene, namely Rhesus negative women in in a population of Rhesus positive subjects or Rhesus positive men in population of Rhesus negative subjects, had lower fitness. This is because RhD-positive children born to pre-immunized RhD-negative mothers were at a higher risk of fetal and newborn death or health impairment from the haemolytic disease. Therefore, mutants or migrants with the rarer variant of the RHD gene could not invade the population and any already existing RhD polymorphism should be unstable.
It has been suggested that this polymorphism can be stabilized when the disadvantage of carriers of the locally rarer allele is counterbalanced by higher viability of their heterozygote children or by another form of frequency-dependent selection [6]. In the past seven years, several studies have demonstrated that Rhesus positive and Rhesus negative subjects differ in resistance to the adverse effects of parasitic infections, aging, fatigue and smoking [7][8][9][10][11][12][13]. A recently published cross sectional study performed on a cohort of on 3,130 subjects showed numerous associations between Rh negativity and incidence of many disorders. In this study, one hundred fifty four (154) of 225 diseases/disorders were reported by at least 10 subjects. Within this subset, 31 significant associations with RhD negativity (21 positive and 10 negative) were observed [14]. A study performed on 250 blood donors has further shown that resistance to the effects of toxoplasmosis is higher in Rhesus positive heterozygotes than in Rhesus positive homozygotes and substantially higher than in Rhesus negative homozogotes [7]. This is the first direct evidence for the role of selection in favour of heterozygotes in stabilization of the RHD gene polymorphism in human populations. Such a mechanism is reminiscent of widely known situations with polymorphism in genes associated with sickle cell anaemia in geographic regions with endemic malaria [15].
RhD protein is a component of a membrane complex of which the function is not quite clear. It is most probably involved in NH 3 transport and possibly also in CO 2 transport [16,17]. The complex is associated with spectrin-based cytoskeleton and therefore plays an important role in maintaining the typical shape (biconcave discoid) of human erythrocytes [18]. The "European" RhD-variant of the RHD gene carries a deletion covering the entire protein-coding part of the gene [1]. Therefore, no product of this allele is synthetized in the cells of RhD negative homozygotes and the RhD is most probably substituted in the corresponding molecular complex by the related protein RhCE. Therefore, erythrocytes of RhD-and RhD+ homozygotes differ in molecular complexes on their cell membranes and probably also in their biological activities. An important difference was also observed between erythrocytes of RhD positive homozygotes and heterozygotes. About 33,560 and 17,720 D antigen sites were detected on the surfaces of an erythrocyte in RhD homozygotes and heterozygotes, respectively [18]. This suggests that the susceptibility of RhD positive homozygotes and RhD positive heterozygotes (and even more so RhD negative homozygotes) to various aberrant conditions, including various diseases, could differ dramatically. Due to the general trade-off principle, heterozygotes could be more resistant to one disease and more prone to another disease while the opposite could be true for homozygotes. Such trade-offs could explain the heterozygote advantage hypothesis and all other observed phenomena.
The frequencies of Rhesus negative subjects (and therefore also Rhesus positive heterozygotes) as well as the incidences of particular diseases and disorders vary between countries. If the protective effect of Rhesus positivity or Rhesus heterozygosity is strong enough, then the relationship between the frequencies of Rhesus negative homozygotes (and Rhesus positive heterozygotes) should correlate with the incidences of specific diseases when important confounding variables are controlled. This could be either because the incidences of particular disease influence the geographic distribution of RhD alleles, or because the differences in prevalence of particular phenotypes influence the incidences of particular diseases. Here, I have studied the correlation of disease burden estimates compiled by the WHO with the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes in a set of 65 countries for which the data on the frequencies of Rhesus-negative individuals are available.
The specific aims of present study (the hypothesis to be tested] are: Hypothesis 1: The frequency of Rh negative homozygotes in particular countries correlates (mostly positively) with the incidence of some health disorder in these countries Hypothesis 2: The frequency of Rh positive heterozygotes in particular countries correlates (mostly negatively) with the incidence of some health disorder in these countries Hypothesis 3:. The relation of the incidence of health disorders with frequency of Rh negative and Rh positive heterozygotes are mostly in opposite directions.

Sources of Data
The data on disease burdens were obtained from the

Statistical Methods
Factor analyses and a general linear model (GLM) analysis of the obtained factors were performed with Statistica v. 8.0 and all other tests with IBM SPSS v. 21. A factor analysis of the specific disease burden data (principal components method, raw Varimex rotation) was used for data reduction. When the factors were extracted from the mortality rate variables, the diseases with not enough data were excluded and the missing data were substituted with means. The Disability Adjusted Life Year (DALY) data for nearly all diseases except trypanosomiasis, Chagas disease, and onchocerciasis were available for all countries in my data set. All factors with eigenvalues > 1.0 were extracted. Type III sum of squares and models including the intercept were used in the GLM (multivariate and multiple-multivariate) analyses; for details, see [22].

Association between Disease Burden and Frequencies of RhD Genotypes
Mortality and morbidity data for more than one hundred diseases are available in the WHO database [19]. A search for associations between Rhesus genotype and particular diseases would necessarily result in many spurious associations. Therefore, factor analysis was initially used for data reduction. Performing this analysis on 125 diseases and burden of disease categories yielded 23 factors with eigenvalues>1.0, together explaining 85% of the variability in Disability Adjusted Life Year (DALY) among 192 WHO member countries. The DALY has been defined [19] as "a health gap measure that extends the concept of potential years of life lost due to premature death and also to include equivalent years of 'healthy' life lost by virtue of being in a state of poor health or disability". Next, a multiple multivariate analysis was performed with these 23 factors as dependent variables and frequencies of Rhesus negative homozygotes, Rhesus positive heterozygotes and also five potential confounding variables: GDP, latitude (distance from the equator), humidity, medical care expenditure per capita and frequencies of smokers in the population for 65 countries with the RhD genotype frequency data as independent variables. The effects of the frequency of RhD heterozygotes (μ 2 = 0.67, P = 0.013), smokers (μ 2 = 0.75, P = 0.001), latitude (μ 2 = 0.71, P = 0.005), and humidity (μ 2 = 0.64, P = 0.027) but not frequency of RhD negative homozygotes (μ 2 = 0.60, P = 0.070), GDP (μ 2 = 0.42, P = 0.635) and medical care expenditure (μ 2 = 0.23, P = 0.991) were significant. This model explained considerable parts of variability in factors 1-3, 6-8, 11, and 22 (adjusted R 2 = 0.372, 0.771, 0.497, 0.461, 0.197, 0.237, 0.224, and 0.296, respectively). Post hoc simple multivariate analyses showed that the frequency of Rhesus negative homozygotes correlated with five of 23 factors that together explained 17.7% of variability between different countries in DALY (Table 1). Similarly, the frequency of RhD positive heterozygotes significantly correlated with six factors that together explained 13.9% of variability between different countries in DALY. Three correlations of 23 factors with frequency of RhD positive homozygotes and one with RhD negative homozygotes remained significant even after the correction for multiple tests [23]. Moreover, the expected number of false significant results for 23 statistical tests was not 5.5 but 1.2, suggesting that most of the positive results were not simply due to multiple comparisons. In accord with the heterozygote advantage hypothesis, the regression coefficients (B-values) for Rhesus negative homozygotes and Rhesus positive heterozygotes went toward opposite directions whenever any of these correlations was significant.

Correlation between Particular Disease Burdens and Frequencies of RhD Genotypes
In the exploratory part of the study, univariate correlations analyses showed many strong associations between disease burdens measured with DALY or with incidences of deaths per population of 100,000 and the frequencies of Rhesus negative subjects or Rhesus positive heterozygotes (Fig 1). However, specific disease burdens also strongly correlated with some of the confounding variables, most often with latitude, smoking and humidity. These covariates expressed either highly significant correlation with frequency of Rhesus negative subjects and Rhesus positive heterozygotes (GDP: R = 0.64, P < 0.000001, latitude: R = 0.65, P < 0.000001, medical care expenditure per capita: R = 0.68, P < 0.000001, frequencies of smokers in the population: R = 0.34, P < 0.001; Spearman correlation), or a trend (humidity: R = -0.19, P = 0.132). Therefore, the GLM analysis was used to search for the association between Rhesus genotypes and disease burden. This was accomplished by using the frequency of Rhesus negative homozygotes and Rhesus positive heterozygotes and GDP, latitude, humidity, medical care expenses per capita and frequencies of smokers within the populations of 65 countries possessing the RhD genotype frequency data. Tables supplied by the WHO contained information regarding 121 diseases and disease categories for this subset of countries. The results presented in the Table 2 show that the frequencies of Rhesus negative subjects correlated with DALY for 21 of 121 diseases and disease categories (twelve positively and nine negatively). The frequency of Rhesus positive heterozygotes correlated with DALY for 25 of 121 diseases and disease categories (eleven positively and fourteen negatively). Similarly, the frequencies of Rhesus negative subjects correlated (all positively) with the mortality rates (incidences of deaths per population of 100,000) for 10 of 97 diseases and disease categories with mortality rate data available. The frequencies of Rhesus positive heterozygotes correlated with the mortality rates for 8 of 97 diseases and disease categories (two positively and six negatively). The expected number of false The first column shows (in parentheses) the percentages of variability between countries in the total disease burden (DALY) explained by a particular factor. The columns 3-9 show partial correlation (Beta) and statistical significance (P) of the correlations between the factors 1-23 and the frequencies of RhD-homozygotes, RhD+ heterozygotes, HDP, latitude, humidity, medical care expenditure per capita, and frequencies of smokers in the population.
Significant results (P < 0.05) and trends (P < 0.10) are printed in bold. Asterisks indicate results significant in two-sided tests after the correction for multiple test. Values < 0.0005 are coded as 0.000.
significant results for 436 statistical tests was not 65 but 22, however, due to the complicated network of correlation between incidence (or morbidity) of particular diseases (which did not exist between non-correlated factors obtained by the factor analysis) it was not possible to perform an unbiased formal correction for multiple tests (Garcia 2004) [23].

Discussion
The results of the ecological regression analysis were in a perfect agreement with all three a priori hypotheses. They showed that both the frequencies of Rhesus negative homozygotes as well as those of the Rhesus positive heterozygotes correlated (mostly in the opposite direction) with specific disease burdens. The general pattern was that the countries with a high frequency of Rhesus negative homozygotes had lower congenital-anomalies-associated burden and neuropsychiatric condition-associated burden (except Alzheimer's and Parkinson's Disease burden) as well as higher cardiovascular and especially malignant neoplasm-associated burden. Noticeably, the only form of cancer expressing a clear opposite trend was cervix uteri cancer, i.e. cancer of viral origin. Many (but not all) of the disorders observed to be affected by RhD phenotype, such as cardiovascular diseases, lung cancer, liver cancer, asthma, could be considered as "modern" diseases. It is therefore questionable whether they could shape the geographical distribution of RhD allele in the past. However, the RhD minus allele (the deletion) has probably spread from one geographic location in western Europe also relatively recently, definitively after the colonization of Europe by modern Homo sapiens sapiens. It is highly probable that in this time people suffered from liver cancer induced by mycotoxins, from lung cancer, from asthma induced by smoke, and after the loss of the skin melanin also from skin cancer in a similar rate as today's humans.
The results of the current study agree with observations of worse health status of Rhesus negative subjects reported by earlier case control studies [12,13] or observed in the large cohort study performed on a population of 3,130 subjects [14]. However, the results of the ecological regression and the case-control or cohort studies are difficult to compare. For example, in the   case-control and cohort studies, but not the ecologic regression study, the effect of RhD negativity on the health status is seemingly increased by an opposite effect of RhD heterozygosity on RhD positive controls consisting of both RhD positive heterozygotes and homozygotes. Moreover, a positive correlation between the focal disease burden and the frequency of a particular RhD genotype observed in an ecological regression study could either be due to an increased sensitivity of Rhesus negative individuals toward the focal disease or a relatively higher resistance or tolerance of these subjects toward other diseases. For example, the protective effect of RhD negativity against many neuropsychiatric disorders observed in the present ecological study could be caused by the fact that RhD negative subjects usually die at an earlier age due to their higher susceptibility to cardiovascular diseases. It is not clear whether the presence of RhD minus allele alone, or the presence of other alleles in a strong genetic linkage with this allele is responsible for the observed protective effect of heterozygosity. For example, in the Czech population about 95% of subjects with D+C+c-E-e+ phenotype are RhD+ homozygotes and only 4.5 RhD+ heterozygotes (Daniels 2002). Alternative models for explaining observed associations between RhD phenotype and specific disease burdens based on genetic disequilibrium and gene flow (human history) should be tested in future studies. Limitations of present study: Onset of disorders associated negatively with the frequency of RhD positive heterozygotes, e.g. liver and lung cancer, is rather high in modern human. It is not clear whether decrease of incidence of such disorders could increase fitness of carriers of this phenotype strongly enough. It must be noted, however, that the results of a previous more sensitive case control study performed on a mostly young population indicate that in addition the incidence of many early-age onset disorders, such as diarrhea, thyroiditis, anemia, panic disorder, and scoliosis, was lower in RhD positive subjects.
The interpretations of ecological regression studies are sometimes complicated, especially if aggregated data are used for the estimation of the strength and direction of the influence of particular factors within a population [24,25]. Therefore, one must be very careful with the interpretation of the observed associations between a particular disease burden and the frequency of Rhesus negative subjects within a population. For example, a positive correlation could be due to an increased sensitivity of Rhesus negative individuals to the focal disease or just to higher resistance or tolerance of these subjects to other diseases. Therefore, future studies of the mechanisms of effects of the Rhesus phenotype on risks of particular diseases must be grounded within individual based (case-control and cohort) studies. However, the objective of the present ecological study was to test the heterozygote advantage hypothesis of maintaining the genetic polymorphism in the RHD gene, resulting in polymorphism of the Rhesus factor phenotypes. Based on this hypothesis, I suggested that the frequencies of Rhesus negative homozygotes as well as Rhesus positive heterozygotes should correlate with certain disease burdens and the direction of such correlation will be the opposite for the two phenotypes. The results of present regression study have confirmed these two predictions.
The frequencies of Rhesus positive heterozygotes were calculated from the frequencies of Rhesus negative homozygotes using the Hardy-Weinberg equation. If the conclusions of the present study are correct, then these theoretical frequencies are influenced by a selection against RhD negative homozygotes and in favor of RhD heterozygotes in middle-age and especially in high-age strata. However, H-W equilibrium is being re-established in every generation and the differences between theoretical and real frequencies for reproductive age-strata are probably relatively low, especially in developed countries.

Conclusions
The study confirmed all three a priori hypotheses: The frequency of Rh negative homozygotes in particular countries correlates (mostly positively) with the incidence of some health disorder in these countries 2) The frequency of Rh positive heterozygotes in particular countries correlates (mostly negatively) with the incidence of some health disorder in these countries 3) The direction of the relation of incidence of the health disorder with frequency of Rh negative homozygotes and with Rh positive heterozygotes are mostly (actually always) the opposite.
Some of the associations observed in the study were relatively strong. For example, the slopes values (B) of 1,812 and -463 for cardiovascular diseases suggests that a 1% increase within the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes would result in 1,812 more and 463 less cardiovascular disease associated deaths per 100,000 inhabitants, respectively. A positive nonlinear correlation between the frequencies of Rhesus negative homozygotes and Rhesus positive heterozygotes exists in the population equilibrium. Therefore, an increase of the frequency of Rhesus negative homozygotes is always accompanied by an increase of the frequency of heterozygotes within a population. The disadvantage of an increased frequency of Rhesus negative homozygotes in a population is therefore usually at least partly compensated for through an increased frequency of heterozygotes. However, from the point of view of human medicine and especially that of an RhD negative individual, the increased risk of a particular disease associated with one genotype is not compensated for through the decreased risk of a disease in individuals with another genotype.
From the point of view of basic science, the most important merit of this study is its robust support of the heterozygote advantage hypothesis. The results suggest that the Rhesus factor polymorphism is maintained in human populations due to a higher resistance or tolerance of heterozygotes against specific diseases. It could be speculated to what extent the highly uneven distributions of RHD minus alleles in world populations might be the result of a founder event and a gene flow [26] and to what extent it is also modulated by specific selection pressures caused by differences in the geographical distribution of a disease or diseases.
Supporting Information S1 Data. Data file containing frequencies if RhD genotypes, specific disease burdens and confounding variables for 65 countries. (XLSX)