Population genetic data of the 21 autosomal STRs included in GlobalFiler kit of a population sample from the Kingdom of Bahrain

Bahrain’s population consists mainly of Arabs, Baharna and Persians leading Bahrain to become ethnically diverse. The exploration of the ethnic origin and genetic structure within the Bahraini population is fundamental mainly in the field of population genetics and forensic science. The purpose of the study was to investigate and conduct genetic studies in the population of Bahrain to assist in the interpretation of DNA-based forensic evidence and in the construction of appropriate databases. 24 short-tandem repeats in the GlobalFiler PCR Amplification kit including 21 autosomal STR loci and three gender determination loci were amplified to characterize different genetic and forensic population parameters in a cohort of 543 Bahraini unrelated healthy men. Samples were collected during the year 2017. The genotyping of the 21 autosomal STRs showed all of the loci were in Hardy-Weinberg Equilibrium (HWE) after applying Bonferroni’s correction. We also found out no significant deviations from LD between pairwise STR loci in Bahraini population except when plotting for D3S1358-CSF1PO, CSF1PO-SE33, D19S433-D12S391, FGA-D2S1338, FGA-SE33, FGA-D7S820 and D7S820-SE33. The SE33 locus was the most polymorphic for the studied population and THO1 locus was the less polymorphic. The Allele 8 in TPOX scored the highest allele frequency of 0.496. The SE33 locus showed the highest power of discrimination (PD) in Bahraini population, whereas TPOX showed the lowest PD value. The 21 autosomal STRs showed a value of combined match probability (CMP) equal to 4.5633E-27, and a combined power of discrimination (CPD) of 99.99999999%. Off-ladders and tri-allelic variants were observed in various samples at D12S391, SE33 and D22S1045 loci. Additionally, pairwise genetic distances based on FST were calculated between Bahraini population and other populations extracted from the literature. Genetic distances were represented in a non-metric MDS plot and clustering of populations according to their geographic locations was detected. Phylogenetic tree was constructed to investigate the genetic relatedness between Bahraini population and the neighboring populations. Our study indicated that the twenty-one autosomal STRs are highly polymorphic in the Bahraini population and can be used as a powerful tool in forensics and population genetic analyses including paternity testing and familial DNA searching.


Introduction
Kingdom of Bahrain is a small archipelago consisting of 33 islands, only the five largest are inhabited. These islands are Bahrain, Muharraq, Umm and Nasan and Sitra. Bahrain is positioned in the Arabian Gulf. To the southeast of Bahrain is the State of Qatar, and to its west lies the Kingdom of Saudi Arabia, with which it is connected by a 25-kilometer causeway. To the north and east of Bahrain lies the Islamic Republic of Iran [1].
Bahrain is one of the most densely populated countries in the world, with a total landmass of 760 square kilometers. Mid-2014, estimates of Bahrain's population stood at 1,314,562 persons. Of these, 568,399 are Bahraini citizens (46%) and 666,172 are expatriates (54%) [2].
Standing between the most substantial focal points of the ancient world-the Far East, the Indus Valley, Fertile Crescent, the Red Sea and the Coast of East Africa [3], trade goods from the Persian Gulf made its way into Europe through Antioch [4]. This made Bahrain an important port city, a metropolitan hub where different cultures met [5].
Because of the geographic location of Bahrain, the diversity of the population had been affected. This could be explained by the migration flows from several areas regionally, and eventually internationally [6]. Iranians and migrants of Iranian heritage constituted the largest groups of migrants who were Muslim and ethnically not Arab [7]. Indian and Iranian migration boomed in the early and mid-20th century, as the Bahrain Petroleum Company sought a workforce for the oil that was discovered in the island [8].
Population is mainly divided into four main ethnic groups: Arabs, Baharna and Persians (Huwala and Ajam) [4,9,10]. This geographical and social organization might be expected to have an effect on patterns of a genetic diversity [11].
Genetic studies on Bahrain to date are very limited and knowledge of any such structure is important in the interpretation of the significance of DNA-based forensic evidence and in the construction of appropriate databases. This present study is the first to characterize genetically the Bahraini population, using Globalfiler amplification kit. Twenty-four autosomal short-tandem repeats (STRs) in GlobalFiler PCR Amplification kit (Thermo Fisher Scientific, Inc., Waltham, MA, USA) were studied to characterize different forensic and genetic population parameters in 545 Bahraini males. The 6-dye GlobalFiler PCR Amplification kit (Thermo Fisher Scientific, Inc., Waltham, MA, USA) was designed to incorporate 21 commonly used autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317,  D16S539, D2S1338, D19S433, VWA, TPOX, D18S51, D5S818, FGA, D12S391, D1S1656, D2S441, D10S1248, D22S1045 and SE33) and three gender determination loci (Amelogenin, Yindel and DYS391) which have been proven to provide reliable DNA typing results and enhance the power of discrimination (PD).

Sample collection
Five hundred and forty-three (543) blood samples were collected on Nucleic-Cards (Copan, Italy) from non-relatives' Bahraini males. The research study was announced publicly through different social media channels such as Twitter and Instagram. Participants who wished to participate contacted the corresponding author for establishing meetings and arrived to the General Directorate of Criminal investigation and Forensic Science-Kingdom of Bahrain to submit their blood samples for the research. Age of the participants ranged from 20 to 55 years old.
In each case, males with ancestry (to the level of paternal grandfather) from four different geographical subdivisions of the country (Capital Governorate, Muharraq Governorate, Northern Governorate and Southern Governorate) were sampled. Ethical review for analysis was provided and approved by the Research and Research Ethics Committee (RREC) (E007-PI-10/17) in the Arabian Gulf University. All participants provided informed consent for contribution their blood samples.

DNA amplification and fragment detection
DNAs were punched and amplified from Nucleic-Cards (Copan, Italy) blood-spot samples using a fully automated workstation, starting from 1.2-mm diameter punches produced using the easyPunch STARlet system (Hamilton, Switzerland).
The samples were directly amplified using GlobalFiler (Thermo Fisher Scientific, Inc., Waltham, MA, USA) according to manufacturer's recommendation. 15μl of low TE Buffer (pH 8.0) was added to the MicroAmp Optical 96-Well Reaction Plate (Thermo Fisher Scientific, Inc., Waltham, MA, USA) prior to the addition of 10μl of GlobalFiler master mix. A total of 24 loci were amplified, including 21 autosomal STR loci and three gender determination loci.
The PCR products (1μl) were separated by capillary electrophoresis in an ABI 3500xl Genetic Analyzer (Thermo Fisher Scientific Company, Carlsbad, USA) with reference to the LIZ600 size standard v2 (Thermo Fisher Scientific, Inc., Waltham, MA, USA) in total of 9 μl of LIZ600 standard and Hi-Di formamide (Thermo Fisher Scientific, Inc., Waltham, MA, USA) master mix. GeneMapper ID-X Software v1.4 (Thermo Fisher Scientific, Inc., Waltham, MA, USA) was used for genotype assignment. DNA typing and assignment of nomenclature were based on the ISFG recommendations.

Statistical analysis
Allele frequencies, Minor allele frequencies (MAF) and different parameters of forensic efficiency-such as power of discrimination (PD), random matching probability (PM), power of exclusion (PE), polymorphism information content (PIC), typical paternity index (TPI), and heterozygosity (He)-were estimated for each locus using GenAlEx software V.6.503 [12]. Fisher's exact tests to evaluate the Hardy-Weinberg equilibrium (HWE) by locus and linkage disequilibrium (LD) between pair of loci were estimated with STRAF-A convenient online tool for STR data evaluation in forensic genetics [13]. Phylogenetic tree was constructed from allele frequency data by using the neighbor-joining method [14] via web version of POPTREEW [15] It is used to compare between different genetic structure of the populations with Bahraini population using the minimum available loci for different populations. The tree was constructed with allele frequency data of fifteen STR loci (D8S1179, D21S11, D7S820, CSF1PO, D19S433, vWA, TPOX, D18S51, D5S818, FGA, D3S1358, TH01, D13S317, D16S539 and D2S1338) for all populations.
Also, Multidimensional scaling (MDS) analysis was done using IBM SPSS Statistics 21.0 [16] to investigate the populations structure between Bahraini population and the abovementioned populations based on FST's genetic distances.

Hardy-Weinberg equation (HWE) and linkage disequilibrium (LD)
In the present study no significant deviation from HWE was observed (p> 0.05) except for three markers; D3S1358, D19S433 and D5S818 (Tables 1-5). After Bonferroni's correction  Population genetic data of Bahraini population  was applied (p > 0.000092), all of the samples were in HWE. Full dataset of Bahraini population is shown in S1 Table. The study also showed no significant deviation from LD between pairwise STR loci after Bonferroni's correction (p > 0.000092) in Bahraini population except for the following loci; D3S1358-CSF1PO, CSF1PO-SE33, D19S433-D12S391, FGA-D2S1338, FGA-SE33, FGA-D7S820 and D7S820-SE33 when plotted. The highest pairwise LD was 1.00 when plotting CSF1PO-D19S433, D21S11-FGA and FGA-D1S1656. The marker D22S1045 did not show any probability. This lack of probability correlated with the off-ladder cases observed in D22S1045 and which may be the reason for the null probability value. D22S1015, SE33 and D21S11 loci also revealed evidence of a rare variant and off-ladders (Fig 1).

Allele frequencies and forensic parameters
In the studied population, the number of allele (Na) per locus was ranged from 7 for markers D16S539, TPOX and THO1 to 48 for SE33, the mean number of alleles per locus was 14, and a total number of alleles observed was 288. The most polymorphic locus was SE33 (Tables 1-5).
https://doi.org/10.1371/journal.pone.0220620.g002 Population genetic data of Bahraini population Generally, the polymorphism degree of a specific locus can be measured by two distinct parameters-the heterozygosity and the Polymorphism Information Content (PIC). We have found out that the observed heterozygosity (Ho) was ranged from 67% for locus TPOX to 92% for locus SE33. (Tables 1-5).
PIC values for all STR loci were highly informative (PIC�0.6) with an average of 78.3%. The means for (Na) and (He) designate the high levels of genetic diversity in the population studied. These high informative values support the heterozygosity values indicating the high degree of genetic polymorphism.
The random matching probability (PM) was ranged from 0.006 for SE33 to 0.156 for TPOX. The Power of exclusion (PE) was ranged from 0.384 for locus TPOX to 0.838 for locus SE33. The SE33 locus showed the greatest (PD) in Bahraini population, whereas TPOX showed the lowest. The higher the discrimination power of a locus, the more efficient it is in discriminating between members of the population (Tables 1-5).
The PD values for most of the tested loci was above 0.9; the highest value was observed for SE33 (0.994) whereas the least value was observed at TPOX (0.844). The combined power of discrimination (CPD) and combined matching probability (CMP) for all the 21 STR loci were 99.999999% and 4.5633E-27 respectively.

Interpopulation diversity
To measure the diversity between Bahraini population and populations previously reported, phylogenetic tree and MDS were conducted between Qatari population  (Fig 3). As shown, Bahraini and Saudi populations positioned in the right bottom cluster, Bengali and Indian populations clustered together, Emirati and Iranian were also clustered together, followed by Iraqi, Qatari, and Egyptian in the same cluster. Sri Lankan and Kuwaiti populations were in separate clusters found apart.

Discussion
The observed deviation from HWE (neglecting the Bonferroni's correction) could be a result of the diversity of the Bahraini population or caused by high polymorphism at the same loci investigated loci. This observation are likely to reflect the high level of inbreeding with consanguinity rates in Bahrain, with intra-familial unions accounting for 20-50% of all marriages compared to other Arab countries [21]. The PD in correlation with PM supports the high degree of polymorphism between Bahraini individuals.
We have compared Bahraini population data to the nearest available populations using the accessible loci. It is shown that the Bahraini population shares similar results with the study conducted of Saudi Arabia and UAE populations using the GlobalFiler STR loci [19,22]. As the above-mentioned populations share the most informative and polymorphic locus is SE33 and the least informative locus is TPOX. The least polymorphic was locus D16S539 for UAE population [19] whereas THO1 for both Bahraini and Saudi Arabian populations [22]. Allele 8 in locus TPOX scored the highest frequency for Bahraini, Kuwaiti, Saudi Arabian, Iraqi, Egyptian and Iranian populations [18,22] whereas the highest frequency for Indian and Bangladeshi populations is allele 12 in CSF1PO [18]. Regarding the phylogenetic tree construct, the data from the ten populations are consistent with other population data from the region [18, 23, 24] based upon the FST values obtained. The obtained FST value of Bahrain is 0.006 which is less than the recommended value for casework statistics of FST < 0.01 [25].
As expected, the diversity between the data obtained in this study compared to the neighboring data populations varies, as the populations become more geographically separated.
Once more studies of Arab populations in the region become accessible, it may be more probable to develop a greater understanding of the genetic associations between the different populations for the Arabian Peninsula.

Conclusions
In conclusion, we have reported the allele frequencies and forensic statistical parameters of the GlobalFiler STR loci in Bahraini population to be indicated in literature for the first time. The polymorphism of the 21 autosomal markers observed in this study such as SE33 marker Population genetic data of Bahraini population indicates its usefulness for paternity testing, forensics and familial DNA searching in the population of Bahrain.
Overall, these parameters indicated the general utility of this STR loci panel for forensic personal identification and paternity testing in the Bahraini population, thereby further confirming of its efficacy for forensic practice also in Bahraini sub-populations and other populations' genetics and diversity studies.
Supporting information S1