The genetic heterogeneity of Arab populations as inferred from HLA genes

This is the first genetic anthropology study on Arabs in MENA (Middle East and North Africa) region. The present meta-analysis included 100 populations from 36 Arab and non-Arab communities, comprising 16,006 individuals, and evaluates the genetic profile of Arabs using HLA class I (A, B) and class II (DRB1, DQB1) genes. A total of 56 Arab populations comprising 10,283 individuals were selected from several databases, and were compared with 44 Mediterranean, Asian, and sub-Saharan populations. The most frequent alleles in Arabs are A*01, A*02, B*35, B*51, DRB1*03:01, DRB1*07:01, DQB1*02:01, and DQB1*03:01, while DRB1*03:01-DQB1*02:01 and DRB1*07:01-DQB1*02:02 are the most frequent class II haplotypes. Dendrograms, correspondence analyses, genetic distances, and haplotype analysis indicate that Arabs could be stratified into four groups. The first consists of North Africans (Algerians, Tunisians, Moroccans, and Libyans), and the first Arabian Peninsula cluster (Saudis, Kuwaitis, and Yemenis), who appear to be related to Western Mediterraneans, including Iberians; this might be explained for a massive migration into these areas when Sahara underwent a relatively rapid desiccation, starting about 10,000 years BC. The second includes Levantine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), along with Iraqi and Egyptians, who are related to Eastern Mediterraneans. The third comprises Sudanese and Comorians, who tend to cluster with Sub-Saharans. The fourth comprises the second Arabian Peninsula cluster, made up of Omanis, Emiratis, and Bahrainis. It is noteworthy that the two large minorities (Berbers and Kurds) are indigenous (autochthonous), and are not genetically different from “host” and neighboring populations. In conclusion, this study confirmed high genetic heterogeneity among present-day Arabs, and especially those of the Arabian Peninsula.


Introduction
The human leukocyte antigens (HLA) system plays a key role in self-nonself recognition, and is divided into class I (HLA-A, -B, and -C) and class II (HLA-DP, -DQ, and -DR) loci, and comprises 220 genes in a 3.6 Mb region found on the short arm of chromosome 6. HLA system is highly polymorphic, and in excess of 17,000 alleles were detected. For example, there are 4,828 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Study flow
The use of more than fifty key words allowed identification of 5,456 papers and HLA datasets, of which 315 were deemed relevant to the study. Of these, 42 articles and 11 HLA datasets containing information on 56 Arab populations, and meeting the study criteria, were included. The study flow is illustrated in Fig 1. In addition, 20 articles and 18 HLA datasets which meet the criteria of this study, containing complete information on 44 other populations were selected, but without going through systematic review. The populations used in the comparison were chosen mainly from neighboring Arab countries. This study relied on a database consisting of 100 populations (of which data of 11 populations were extracted from association studies) from 36 countries Arab and worldwide countries, and belonging to Asia, Europe, and Africa. The distribution of populations by region is illustrated in Fig 2A. These populations represent allele frequency data for 16,006 individuals (160.06 individuals/population), and from 63 references.

Allelic comparison between Tunisians and other populations
Allelic comparisons were done at Neighbor-Joining, correspondence analysis, and standard genetic distances. Analyses were performed with Class I and Class II markers, and at generic and high-resolution levels to make the most of available data, and seeing that some of the populations included in these comparisons lack high-resolution data. Neighbor-joining dendrograms. Comparison at the generic level was made using genetic distances based on DRB1 Ã and DQB1 Ã allelic frequencies. Four groups can be interpreted from Correspondence analysis. High-resolution DRB1 correspondence analysis (Fig 4) demonstrated the clustering of the studied populations into three groups. The first containing North Africans (Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards, Portuguese, Murcians), French, Saudis, Yeminis-Jews, and Khuzestani Arabs. The second contains Eastern Mediterraneans (Greeks, Cretans, Lebanese, Palestinians, and Macedonians), Berbers of Djerba, Italians, Iraqi-Kurds, Iranians, Egyptians, Ashkenazi-Jews, and Moroccan-Jews. The last cluster consists of Sub-Saharan populations. It should be noted that Jordanians, Bahrainis, and Sudanese were outside these main groups. Similarly, correspondence analysis using class I (A and B) identified three main clusters ( Fig 5). The first cluster contained all Sub-Saharan Africans along with Sudanese. The second cluster contains Eastern Mediterranean populations (Albanians, Greeks, Cretans, Lebanese, Palestinians, and Macedonians), Italians, Iraqi-Kurds, Ashkenazi-Jews, and Jordanians-A. The last cluster includes North Africans (Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards), French, and Saudis. Correspondence analysis based on generic DRB1 data, and using only Arab populations shows that Arabs can cluster into four groups (Fig 6). The first contains the North Africans (Tunisians, Algerians, Moroccans, and Libyans), Saudis, Yemenis, Kuwaitis, and Khuzestanis (Iranian Arabs). The second cluster includes the Arabs of Levant (Palestinians, Jordanians, Lebanese, Syrians), Egyptians, Iraqi Kurds, and Moroccans Jews. The third group consists of Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1 https://doi.org/10.1371/journal.pone.0192269.t003  Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1 https://doi.org/10.1371/journal.pone.0192269.t004 Bahrainis, Omanis, Emiratis and Famoori (Iranian Arab). The fourth is composed of Sudanese, Sudanese from Nuba, and Comorians. Genetic distances. Table 5 illustrates standard genetic distances (SGD) between Arabs and other populations, using generic DRB1 Ã allele frequencies. North Africans and Iberians are the closest to Saudis. Moroccans (Agadir, 0.0024), Basques-Ar (0.0057), and Tunisians-S.    [68], Portuguese (3%) [39], and Moroccan Jews (3%) [66]. A Ã 24:02-B Ã 08:01 (4.75%) and A Ã 30:02-B Ã 53:01 (3.48%) were only identified in Saudis.

Discussion
This meta-analysis is the first genetic anthropology study in MENA region, and included 100 populations from 36 Arab and neighbouring countries, and comprising in excess of 16,000 individuals. A main outcome of the study is the lack of striking differences in the distribution of HLA alleles and haplotypes between North Africans and Arabian Peninsula populations. On the contrary, key differences were noted between Levant Arabs (Lebanese, Palestinians, Syrians), and other Arab populations, highlighted by high frequencies of A Ã 24, B Ã 35, DRB1 Ã 11:01, DQB1 Ã 03:01, and DRB1 Ã 11:01-DQB1 Ã 03:01 haplotype in Levantine Arabs compared to other Arab populations. Class I haplotype frequencies are lower than Class II haplotypes, because of weak LD between A and B loci, due to long physical distance between them, compared to DRB1 and DQB1 loci. The identification of shared haplotypes between Arabs and other Mediterranean and Asian populations is attributed to the higher admixture of Mediterraneans and Asians in Arab populations.

Iberians, North Africans, and Arabian Peninsula inhabitants
The relatedness between North Africans and Iberians was previously discussed [29, 59-62, 69, 78, 79, 86, 88]. Using correspondence analysis, NJ trees and genetic distances, our results show that North Africans are genetically close to Iberians, which is supported by historical events. First, this relatedness is attributed to the Berber migration from the African Sahara northwards in 10000-4000 BC, because of hyper-arid conditions [69]. It may also be explained by the similar history between Iberians and North Africans, both of whom were invaded by Phoenicians, Romans, Germans, Muslim Arabs [89]; the respective invading armies had a mixed genetic complexity; indeed, most of them were mercenaries recruited in recent conquests like in the case of Phoenicians [90] and Muslim who invaded Iberia had troops that were mostly Berbers. The invasion of Iberia by Muslims in the 8th century AD may have had a role in the relatedness between North Africans and Iberians for two reasons: first, most Muslim invaders recruits were North African Berbers, and the second is explained by the 8 centuries period of settlement of the Muslims in Iberia, although more ancient and continuous gene exchange since prehistoric times between Iberia and North Africa may have been induced the main exchange [86]; massive mixed marriages and breeding across religious Iberian groups under Muslim rule is not documented.
The analyses performed showed that current North Africans are closely related to Tunisian (Zrawa and Matmata) and Moroccan (Sousse-Agadir and Eljadida) Berbers, suggesting that North Africans have a genetic Berber profile. On the contrary, North Africans displayed a greater distance from the Arabs of Levant (Palestinians, Syrians, Lebanese, and Jordanians), indicating low genetic contribution of Phoenician and Levant Arab invasion of North Africa. These observations based on HLA markers prompted the conclusion that all Berbers of North Africa constitute a homogeneous genetic unit, except for small isolates, such as the Berbers of Djerba, who display a Berber genetic profile.
Saudi populations used in this study originated from Eastern Saudi Arabia, especially from Riyadh province. There is no reliable HLA data on Eastern Saudi Arabia that shed light on pre-Islamic history; some ancient people may have originated from old Persians, but quantification is difficult and undetermined [91]. The genetic heterogeneity between Eastern and Western Saudi Arabia is very possible, and should be taken into account in further interpretation. All analyses performed here, using HLA-A,-B, -DRB1, and DQB1 markers support the notion that Saudis along with the Kuwaitis and Yemenis are closely related to North Africans.
The most plausible explanation for West Arabia and Yemen clustering with Iberian/North Africans is a possible important massive migration that occurred when Sahara underwent desiccation in all directions [92,93]. Cultural and language relatedness of many Mediterranean languages, including old Iberian and Basque [92], with Berber language are concordant with our genetic findings and Saharan origin hypothesis; also a part of Arabian Peninsula inhabitants (including Yemen) may had been reached by Saharan people. In fact, Malika Hachid who has been studying Saharan and North African Archaeology, culture and rock painting/writing of prehistoric Sahara, even suggests that first known writing alphabet was originated in Sahara. Proto-Berber writing rock characters have been used (very similar to present day used Berber scripts). This Proto-Berber language could have appeared 5,000 years BC [94,95].
Explanation to HLA Kuwait genetic similarity to this group seems more difficult to achieve but interaction between Arabian Peninsula and Mesopotamia through this strategic Kuwait area is documented since 6,500 years BC (Ubard Period) [96].

Arabs of Levant
Using genetic distances, correspondence analysis and NJ trees, we showed earlier [61,62] and in this study that Palestinians, Syrians, Lebanese and Jordanians are closely related to each other and to Eastern Mediterranean Europeans (Turks, Cretans, Greeks), Egyptians and Iranians, and confirmed by HLA class I (A, B) and class II markers (DRB1 and DQB1) analysis. However, Levant Arabs are distant from North African Arabs (Tunisians, Algerians, Moroccans and Libyans) and Iberians (Basques, Spaniards). The strong relatedness between Levant Arab populations is explained by their common ancestry, the ancient Canaanites, who came either from Africa or Arabian Peninsula via Egypt in 3300 BC [97], and settled in Levant lowlands after collapse of Ghassulian civilization in 3800-3350 BC [98]. The relatedness is also attributed to the close geographical proximity, which constituted one territory before 19th century British and French colonization.
The close relatedness of Levant Arabs to Egyptians, as confirmed genetic distances using HLA markers, may be due to three reasons. First, Egypt is a neighbor to Levant Arab countries, and historically part of the Levant. Second, the Egyptians invaded the Levant several times throughout history; the most significant was 1468 BC invasion, where they settled for 12 centuries [99]. Third, the Canaanites, the likely ancestors of Levant Arabs, may have originated from Africa through Egypt, where they settled for a long period, suggesting likely admixture between Canaanites and Egyptians.
Historically, Levant is a wider region that included countries along the Eastern Mediterranean with its islands, and extended from Greece to Cyrenaica [100]. Broadly, Levant was historically characterized by high migratory flow between its sub-regions in all directions. For example, present-day Levant comprising Palestine, Lebanon, Syria, and Jordan has undergone successive invasions by populations originating from the great Levant, including Egyptians (1468 BC), Horites, Amorites, Hitites (Turks), Greeks (1200 BC), Assyrians (1090 BC) [99], and more recently the Ottomans. This has favored admixture, reduced distances and homogenized Great Levant populations, thus explaining the close relatedness of Levant Arabs to Eastern Mediterranean populations. On the other hand, Levant Arabs are distant from Saudis, Kuwaitis, and Yeminis, an indication that the contribution of the Arabian Peninsula populations to Levantine gene pool is low, probably due to the absence of the demographic aspect of 7th century invasion.

Sudanese and Comorians
Sudanese are close to sub-Saharan Africans (Nigerians, Congolese, and Senegalese), and North Africans, in particular Egyptians, suggesting that the genetic profile of Sudanese is the admixture between North Africans (especially Egyptians) and sub-Saharan Africans throughout history. The close relatedness of Sudanese to sub-Saharan Africans suggests a reduced genetic effect of Arabs on Sudanese. Also, the Comorians (Comoros islands officially joined League of Arab Countries in 1993) are close to sub-Saharan Africans (Congolese, Nigerians, and Gabonese) [43], Egyptians, Iranians, and Eastern Mediterranean. This suggests high admixture between populations belonging to three continents in the Comoro Islands, and can be explained by their geographical position as a corridor for international trade.

Bahrainis, Emiratis, and Omanis
Bahrainis, Emiratis, and Omanis are geographically similar populations, which explains their genetic relationship as demonstrated in this study. These three populations tend to form a heterogeneous group with Pakistanis, Indians, Iranian Arabs (Famoori), Sardinians (the later probably close to Iberians/North Africans but behaving as out layer group in analyses because of they are a genetic island isolate), Egyptians, and some sub-Saharan Africans, such as Congolese. These populations appear close to certain Eastern Mediterranean populations including Greeks, Macedonians, and those further, in particular North Africans, hence explaining their intermediate grouping, and distinction from two main clusters. Collectively, this suggests high admixture in these populations brought about by their commercially important position. Sardinia is a relative genetic isolate "founded" by Iberian Norax/Nora (first documented Sardinian capital close to Cagliari) and Iberians/North Africans may be genetically related to Sardinians (A Ã 30-B Ã 18-Cw Ã 5 basic HLA haplotype is very high in Sardinia, Iberia, and North Africa) [93]. Berbers populations used in this work are closely linked to each other, as well as to presentday North Africans, and to Western Mediterranean populations, especially Iberians. Indeed, the Moroccan Berbers are not genetically different from the current Moroccans, nor those of neighboring populations, like Algerians and Tunisians. This also applies to Tunisian Berbers, except those of the island of Djerba, who appear to be related to Eastern Mediterranean populations, including Levant Arabs. This suggests that North African Berbers are in perfect harmony with their environments, and that differences between them are cultural rather than genetic due to 7th century Arabization of the region.

Minorities of Arab World
Clustering and genetic distances analyses demonstrated that Iraqi and Iranian Kurds are not genetically different from Iranians or neighboring populations, including Levant Arab, and are close to Turks and other Eastern Mediterranean populations. This suggests that Kurds originate from the region, and are in genetic harmony with neighboring populations, despite the clear cultural differences. This suggests that Kurds, Syrians, Jordanians, Palestinians, Iraqis, Lebanese, and Iranians probably share the same genetic profile, with few differences. Accordingly, our findings confirm the results of an earlier study of Arnaiz-Villena on Iraqi Kurds [54].
Religious minorities. Sunni Muslims constitute the majority (80%) of Arab populations, followed by Shi'a Muslims (10%) who are present in parts of Iraq, Lebanon, Saudi Arabia, Kuwait, Yemen, and Bahrain. Non-Muslims make up about 10% of all Arabs, and Christianity (6%) is the second largest religion among Arabs, with about 20 million Christians living in Lebanon, Egypt, Iraq, Syria, and Jordan. Other minor religions (4%) such as Judaism, Druze and others are practiced on a much smaller scale [99].
HLA data on Sunni and Shiite Arabs are not available, same as comparison of Muslims to Christians. The only available data are those concerning Arab Jews. In this study, data are available for three Jewish populations, including two from North Africa (Moroccan and Libyan Jews) and one from the Arabian Peninsula (Yemenite Jews). While genetic distances separating these three groups of Jews are small (S1 Table), genetic heterogeneity between these Jewish populations was noted. For example, Yemenite Jews are related to Western Mediterranean populations, including North Africans and Iberians, while Libyan Jews are related to Eastern Mediterraneans, including Levantine Arabs. The relatedness of Moroccan Jews depends to other communities on the studied HLA loci; they associate with Eastern Mediterraneans using DRB1, but group with Eastern Mediterraneans when the other markers are used.

Conclusion
This study supports the notion that Arabs are divided into four groups. The first consisting of North Africans (Algerians, Tunisians, Moroccans, and Libyans), Saudis, Kuwaitis, and Yemenis, with relatedness to Western Mediterraneans, including Iberians. The second includes Levantine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), Iraqi, and Egyptians, who appear to be related to the Eastern Mediterranean and Iranians, who in turn belonged to 'Great Levant' historically described. The third consists of Sudanese and Comorians who associate with Sub-Saharan Africans. Finally, the fourth group of Arabs comprises Omanis, Emiratis, and Bahrainis. This group associates with heterogeneous populations (Mediterranean, Asian and sub-Saharan). Lastly, the two main indigenous minorities, Berbers and Kurds, are not genetically different from the 'host' and neighboring populations.