Estimating the mutational load for cardiovascular diseases in Pakistani population

The deleterious genetic variants contributing to certain diseases may differ in terms of number and allele frequency from population to population depending on their evolutionary background. Here, we prioritize the deleterious variants from Pakistani population in manually curated gene list already reported to be associated with common, Mendelian, and congenital cardiovascular diseases (CVDs) using the genome/exome sequencing data of Pakistani individuals publically available in 1000 Genomes Project (PJL), and Exome Aggregation Consortium (ExAC) South Asia. By applying a set of tools such as Combined Annotation Dependent Depletion (CADD), ANNOVAR, and Variant Effect Predictor (VEP), we highlighted 561 potentially detrimental variants from PJL data, and 7374 variants from ExAC South Asian data. Likewise, filtration from ClinVar for CVDs revealed 03 pathogenic and 02 likely pathogenic variants from PJL and 112 pathogenic and 42 likely pathogenic variants from ExAC South Asians. The comparison of derived allele frequencies (DAF) revealed many of these prioritized variants having two fold and higher DAF in Pakistani individuals than in other populations. The highest number of deleterious variants contributing to common CVDs in descending order includes hypertension, atherosclerosis, heart failure, aneurysm, and coronary heart disease, and for Mendelian and congenital CVDs cardiomyopathies, cardiac arrhythmias, and atrioventricular septal defects.


Introduction
Cardiovascular diseases (CVDs) are the prime cause of death globally, accounting for over 31% of all the global deaths as estimated in 2012. The major proportion is endured by low-and middle-income countries, such as Pakistan [1]. The World Health Organization has reported 6.34 million disability adjusted life years (DALYs) due to CVDs in Pakistan in the duration 2000-2012, which was 19.6% of the burden by non-communicable diseases in the country [2]. The high prevalence and subsequent mortality attributed to CVDs is due to heritable and environmental contributing factors. The heritable component is polygenic and a result of complex interaction of many genes that confers an increased risk of CVD development [3]. PLOS  Availability of population scale large DNA sequence datasets, such as 1000 Genomes Project [4] and the Exome Aggregation Consortium (ExAC) [5], have enabled researchers to explore variants frequencies of individual loci across populations and to highlight those related to local adaptations and disease susceptibility. The discovery of huge number of rare population or individual specific variants (MAF < 0.5%) in these genome sequencing projects is important for evaluating their contribution to the susceptibility and onset of diseases [6,7]. Compared to the common variants, these rare variants more likely occur at evolutionary constrained site of proteins which have been kept conserved due to their functional importance. Such rare variants affect proteins composition in a more disruptive manner compromising or eliminating their function and affecting some phenotype [8]. The rate of emergence and distribution of such deleterious variants in populations is important in determining the patterns of underlying genetic load for diseases, because the increased accumulation of genetic load of diseases due to non-random segregation of deleterious variants is so detrimental that fixation or near-fixation of these mutations can play a significant role in the extinction of isolated populations with small effective population size [9,10].
The effect of genetic variants for susceptibility or onset of diseases can be assessed in two ways using the DNA sequencing data: either screening the catalogued disease causing variants found already associated with certain disease by case-control studies, or prioritizing the detrimental variants, which have not been previously associated with diseases, by predicting their damaging effect [11]. The variant effect prediction tools make use of the available information such as the degree of conservation at the variant site and type of alteration in the protein composition, or its association with regulatory features and then predict the possible deleteriousness of variants under question [12]. As estimated earlier, on average a healthy person carries 281-515 missense substitutions, out of which 40-85 in homozygous state, predicted to be damaging and disease causing [11]. The presence of such deleterious variants in healthy individuals without showing apparent disease symptoms may be due to these variants being present in the heterozygous state, particularly for those that are associated with autosomal recessive disorders, having low penetrance, or being associated with a late disease onset. By genome wide association studies (GWAS), hundreds of common genetic variants have already been attributed to common CVDs such as hypertension, hypercholesterolemia, and coronary artery disease. Likewise genetic screenings have also identified many rare variants associated with Mendelian CVDs such as cardiomyopathies and arrhythmias. The common variants impart small cumulative risk in the onset of disease. The rare deleterious variants have been hypothesized to pose greater effect for these complex diseases [13]. Quantification of the mutational load for certain diseases provides a framework for understanding the overall effect of population-specific history on deleterious variation.
South Asia is one of the most densely populated regions having approximately one fourth of the world's population [14]. This region faces severe socioeconomic inequities leading to serious health care issues [15]. Large scale ethnographic studies have shown that South Asians are at more risk to cardiovascular diseases than other ethnicities [16,17]. CVDs account for 27% of the deaths in this region of the world, which is alarmingly high [18]. The age-standardized years of life lost due to CVDs has been increased in South Asia as compared to other regions. The incidence of acute myocardial infarction occurs about six years earlier than in western countries [19]. Likewise, the risk and prevalence of coronary artery disease is also considerably high in South Asians than in European populations [20].
Pakistan, the 2nd largest country of South Asia, and 6th largest country of the world (population 193.2 million) [14], is also facing serious health care issues. Estimates show that one in five adults of middle age may have sub-clinical coronary artery disease [21]. Prevalence of coronary artery disease in the local rural population has been reported to be 11.2% in one study [22]. Owing to the socio-demographic perspectives, consanguineous marriages are quite common in this region [23], which are possible cause of high prevalence of genetic disorders including cardiovascular diseases [24]. In this scenario, this study aims to estimate the underlying mutational burden of cardiovascular diseases in the Pakistani population. For this purpose, we make use of publically available genomic data of Pakistani population (Punjabi from Lahore; PJL) in the 1000 Genomes Project, and South Asians (SAS) in ExAC which predominantly contains samples from Pakistan as a cohort of the Pakistan Risk of Myocardial Infarction Study (PROMIS) [25]. For quantifying the mutational load, we applied two approaches, i.e. filtration of variants already reported to be associated with cardiovascular diseases in Clin-Var database, and by predicting the functionally deleterious variants using variant effect prediction tools. In this analysis, we determined the concordance of mutational load of cardiac diseases between the two data sets, i.e., 1000 Genomes Project PJL, and ExAC SAS. We compared the allele frequencies of variants associated with these diseases to understand their relevance for estimating cardiovascular genetic risk in the Pakistani population in comparison with other continental populations.

i. Preparation of genes lists
The genes reported to be associated with common, Mendelian, and congenital cardiovascular diseases were obtained primarily from three data bases, Online Mendelian Inheritance in Man (OMIM) [26], ClinVar [27], and Disease Ontology Annotation Framework (DOAF) [28]. The complete list of diseases at these databases were accessed and filtered for cardiovascular diseases using multiple terms related to CVDs such as 'cardio', 'cardiac', 'heart', 'coronary', 'cardiomyopathy', 'myocardial', 'aneurysm', 'arteriopathy', 'atherosclerosis', 'septal defect', 'septal noncompaction', 'tetralogy of fallot', 'atrial', 'arterial', 'hypertension', 'QT syndrome', 'hypercholesterolemia', 'hyper triglyceridemia' and some manually selected cardiac diseases. These terms were also compared with those in Human Phenotype Ontology [29] and WHO's International Classification of Diseases (ICD-10) database. After manual curation through literature survey and refinement through GeneCards database [30], three lists comprising of genes relating to three categories of CVDs were prepared: one for common CVDs (n = 895 genes) such as hypertension, atherosclerosis, coronary heart disease, and heart failure, second for Mendelian CVDs (n = 320 genes) such as cardiomyopathies, cardiac arrhythmia, QT syndromes, and atrial fibrillation, and third for congenital CVDs (n = 62 genes) such as congenital heart disease, and atrioventricular septal defects. The lists of the selected genes associated with common, Mendelian and congenital CVDs are given in S1 Table. There was overlapping of few genes between these three categories of CVDs (Fig 1). The gene ontology terms to which these finally short listed genes belong were determined by UniProt Gene Ontology Annotation database for human version 2.0 [31] and plotted using the 'BGI WEGO' online Gene Ontology Tool [32].

ii. Data set
Two data sets were used for estimating the mutational load of cardiovascular diseases in Pakistani population, i.e., the 1000 Genomes Project phase 3 data and ExAC release 0.3 data. The variants data for Pakistani population PJL (n = 96 individuals) was extracted from 1000 Genomes project data using the VCFtools [33]. This data of healthy persons was used for estimating mutational load of common, Mendelian and congenital CVDs. From ExAC database, the genetic variations related to South Asians (n = 8,276) were extracted and used as Pakistani data because it predominantly contained Pakistani individuals (n = 7,078) as part of the Pakistan Risk of Myocardial Infarction Study. This data was used for mutational load of Mendelian and congenital CVDs only because it already contained cohort of common cardiac diseases such as hypertension, hypercholesterolemia, and coronary artery disease apart from healthy controls [25].

iii. Analysis pipeline
We developed a pipeline for computational analysis to determine the predicted deleterious effects of genetic variants based on functional annotations and assessing their prevalence using the common bioinformatics tools (Fig 2). The coordinates of the selected genes involved in cardiovascular diseases were obtained from GENCODE release 19 (gencode.v19.annotation. gtf), which is the final build of GENCODE mapped to the human GRCh37 reference assembly [34]. To cover the promoter regions of these genes in the analysis, 2000 was subtracted from the gene's start position (the upstream region) and 2000 was added to the gene's end position (the downstream position). In order to subset the variants of relevant genes bcftools-1.2.1 was used. For the current analysis, only the SNVs were used for prioritization. To determine the functional impact of the subset variants on proteins' structure and function, three widely used tools were employed, i.e., the Combined Annotation Dependent Depletion (CADD) [35], PolyPhen-2 [36], and Sorting Intolerant from Tolerant (SIFT) [37]. These tools make use of machine learning approach to predict the effect of variants based on a number of factors including protein multiple sequence alignment, sequence-and structure-based features, and conservation across available homologous sequences [38]. Our approach was to prioritize missense (non-synonymous) variants preferably with low-and rare-allele frequency, because studies have shown that low-and rare-allele frequency variants are more in functional impact on proteins, whereby these are associated with complex phenotypes/disorders by changing the composition of proteins [39]. The annotation of the variants with CADD was performed using an in-house perl script (Supporting Information Script 1), while annotation with SIFT and PolyPhen-2 was performed with ANNOVAR [40]. We kept the criteria a bit stringent for filtration of harmful variants, such that an SNV was considered 'functionally deleterious' for which PolyPhen-2 HDIV score was > 0.957, SIFT score was < 0.05, and CADD phred-like score was 15 or higher (i.e. 1% percentile highest scores). We called such filtered SNVs as 'predicted deleterious SNVs' (dSNVs). The ancestral and derived states of deleterious variants were retrieved from online CADD annotation tool, which utilizes human-chimpanzee ancestral genome from the Ensembl EPO multiple alignments [41].

iv. Comparison of the variants across the world populations
Population wise allele frequencies of predicted deleterious variants were retrieved by filterbased annotation with ANNOVAR using the 1000 Genomes and ExAC data frequency files. The comparison of allele frequencies for the two data sets (The 1000 Genomes and ExAC data) was carried out independently due to the difference in their data structure. The derived allele frequencies of predicted deleterious variants for cardiovascular diseases in Pakistani individuals of the 1000 Genomes Project were compared with all five major population groups i.e South Asian (SAS), European (EUR), Admixed American (AMR), African (AFR), East Asians (EAS), and Southeast Asian population 'Malay' [42]. Likewise, the derived allele frequencies of predicted deleterious variants in South Asian population of ExAC data were compared with all four populations of the same data i.e Non-Finnish Europeans (NFE), Latino (AMR), African/African American (AFR), and East Asian (EAS).
To find the populations wise genetic differentiation with respect to cardiovascular diseases, pair-wise Weir and Cockerham F ST [43] values were calculated for the 1000 Genomes data, using the VCFtools. For this purpose, two approaches were employed, i.e F ST calculation for all the genes which harbored the predicted deleterious SNVs in this analysis, and for deleterious SNVs only which were prioritized. Likewise, the relatedness of the populations based on the deleteriousness they harbored for cardiovascular diseases was assessed by Principal component analysis (PCA) using the PLINK tool (v1.90b3.

v. Searching the variants in ClinVar database
Annotation of the variants in genes set related to cardiovascular diseases were carried out using the ClinVar data release 20160104 [27]. The allele frequencies of ClinVar variants present in Pakistani individuals were retrieved by ANNOVAR annotation for both the 1000 Genomes populations and ExAC populations as described above. For comparison of allele frequencies among the populations, only those variants were selected with ClinVar significance 'Pathogenic', and 'Likely_pathogenic'.

i. Gene ontology
The grouping of genes under study according to their biological role was carried out using UniProt Gene Ontology Annotation database [31], which showed that most of the genes were primarily involved in binding, catalysis, and molecular transduction in a number of biological processes such as biological regulation, anatomical structure formation, cellular compartment organization and genesis, developmental process, metabolic process, and organismal process etc. (S1 Fig). Gene ontology shows that many genes are also related to structural processes of the heart representing the anatomical nature of cardiac diseases.

ii. The mutational load of CVDs
All the SNVs in intronic, exonic, and flanking regulatory regions of our genes under study, as extracted from 1000 Genomes Project PJL and ExAC SAS data, were analyzed for mutational load by applying our analysis pipeline (Table 1). We calculated the proportions of synonymous, nonsynonymous, deleterious nonsynonymous, and homozygous deleterious SNVs from the two data sets. The proportions of nonsynonymous exonic SNVs (nonsynonymous SNVs/exonic SNVs), and deleterious nonsynonymous SNVs (deleterious nSNVs/nonsynonymous SNVs) was higher in ExAC SAS than in 1000 Genomes Project PJL (0.64 v.s. 0.51, and 0.26 v.s. 0.16 respectively). On the other hand, the proportion of synonymous SNVs and homozygous deleterious SNVs was observed to be higher in 1000 Genomes Project PJL than in ExAC SAS (0.45 v.s. 0.35, and 0.12 v.s. 0.04 respectively) (S2 Fig). After applying the prediction tools as described in analysis pipeline, 561 combinedly predicted deleterious SNVs were prioritized for common, Mendelian and congenital CVDs from 1000 Genomes Project PJL data, while there were 7374 combinedly predicted deleterious SNVs for Mendelian and congenital CVDs from the ExAC SAS data (Fig 3). Based on these findings from two data sets, the mutational load was observed to be higher for common CVDs than for Mendelian and congenital CVDs in Pakistani population. The highest number of deleterious variants contributing to common CVDs in descending order included hypertension, atherosclerosis, heart failure, aneurysm, and coronary heart disease, and for Mendelian and congenital CVDs cardiomyopathies (dilated and hypertrophic), cardiac arrhythmias, and atrioventricular septal defects.

iii. Filtration of variants in ClinVar
The filtration of our set of variants based on pathogenicity in ClinVar database identified several variants associated with Mendelian and congenital cardiovascular disorders. There were 03 variants with ClinVar significance 'Pathogenic', and 02 variants with 'likely Pathogenic', significance for CVDs in 1000 Genomes Project PJL population (S2 Table,  8 (5.19%) to different forms of atrioventicular defects. It was also noted that 31 SNVs had multiple significances for more than one type of Mendelian or congenital CVDs. The allele frequencies of filtered variants were compared which highlighted 11 variants having allele frequency significantly higher in SAS than in other populations ( Table 2). Functional consequences with online VEP tool showed 13 variants with Loss of Function (LoF) effect, and 23 regulatory region variants (S2 Table, sheet B). We highlighted the genomic locations of genes harboring the ClinVar variants associated with common, Mendelian and congenital CVDs in Pakistani population (Fig 4). The loci of different genes such as SCN5A on chromosome 3, KCNQ1 and MYBPC3 on chromosome 11, MYH6 and MYH7 on chromosome 14, and KCNE1 and KCNE2 on chromosome 21 were found enriched for clinically significant variants.

iv. Comparison of derived allele frequencies of predicted deleterious variants across continental populations
Derived allele frequency spectrum of all the SNVs and deleterious SNVs in our genes-set of CVDs filtered from 1000 Genomes Project PJL data and ExAC South Asian data, revealed that majority of the deleterious variants were of rare allele frequency. The proportion of common allele frequency deleterious SNVs (AF > 5%) was found to be 11.59% in 1000 Genomes Project PJL for common, Mendelian and congenital CVDs, while it was found only 00.62% for Mendelian and congenital CVDs from ExAC SAS (Fig 5).

v. Functional consequences of deleterious variants
We then grouped the deleterious variants of both data sets according to their functional consequences to point out LoF variants, including 'stop_gained', 'stop_lost', 'start_lost', 'frameshift , whose 6 out of 9 transcripts were found to be affected with LoF mutation, and is associated with many disease conditions including the hypertension [50, 51]. The 'rs371316552' SNP belongs to cathepsin B (CTSB), whose increased expression has been reported to pose a risk for atherosclerosis and myocardial infarction in rat models [52]. The third LoF SNP 'rs117054298' belongs to insulin-like growth factor (IGF) binding protein-1 (IGFBP1), whose splice site of one transcript ENST00000457280 is disrupted and contributes to atherosclerosis [53]. Likewise, 30 LoF variants were found in ExAC South Asians, out of which 2 were in homozygous state (S4 Table).

vi. Differentiation of deleterious variants
Data from whole genome/exome sequencing projects can be used to find out the extent of differentiation among populations based on the differences in allele frequencies of nonsynonymous variants. The presence of variants with highly differentiated frequencies among the populations provides a direction to fine-map signals of local adaptation as well as susceptibility to diseases [54]. In this study, the differentiation was determined by calculating the Weir and Cockerham F ST in two ways: (1) F ST calculation for PJL versus other South Asian (SAS) populations of 1000 Genomes Project using all the SNVs in genes harboring the filtered deleterious   (Fig 7). The derived allele frequency of rs560826688 is 0.031, and belongs to LRP5 involved in hypertension [55], and derived allele frequency of rs563254260 is 0.026 and lies in SERPINF1 which relates to obesity and hypertension [56]. In addition to these, one highly differentiated (F ST value 0.15-0.25) SNV rs539962979 with F ST value 0.16597 was also observed  Fig). Besides this, 08 highly differentiated and 02 severely differentiated deleterious SNVs were also observed (S5 Table). The observed difference in allele frequencies and calculated F ST values of functionally predicted deleterious SNVs between PJL and rest of the global populations gave a clue for stratification of the world populations based on mutational burden for cardiovascular diseases. So, principal component analysis was performed for the deleterious SNVs and all the SNVs of our genes-set from 1000 Genomes Project data. The analysis with all the low and rare allele frequency SNVs of our genes-set (DAF 5.0%) showed all the populations grouped together while African populations making distinct group (Fig 8A). The analysis with low and rare deleterious SNVs showed all populations grouped at one place while PJL scattering from them ( Fig  8C). Likewise, the PCA with all common allele frequency SNVs (DAF > 5.0%) of our genesset suggested three distinct groups of world populations in which South Asian, European, and American populations appeared as one group. The African populations and East Asian populations grouped separately in this analysis (Fig 8B). In the PCA with deleterious common allele frequency SNVs, the afore-mentioned groups appeared to be merging together (Fig 8D).

Mutation load of cardiovascular diseases
Using the same set of genes, the burden of common and Mendelian, and congenital cardiovascular diseases was also determined for one population from each of five major population groups of 1000 Genomes Project i.e., Yoruba in Ibadan (YRI) in Africa, Southern Han Chinese (CHS) from East Asian, Gujarati Indian from Houston (GIH) in South Asia, Puerto Ricans (PUR) from America, Finnish (FIN) in Finland, and Malay of East Asia which is not part of 1000 Genomes Project. This empirical estimation showed excess of deleterious derived rare variants (singletons) in YRI and Malay populations, while FIN and PJL populations harbored the least number of deleterious derived singletons (Fig 9A). Furthermore, the proportion of homozygous deleterious derived SNVs was observed to be second highest in PJL after the Finnish population (PJL 12.30%, Finnish 12.79%, Fig 9B).

Discussion
In this study, we have quantified the mutational burden for common, Mendelian, and congenital cardiovascular diseases in Pakistani population and compared it with other populations of the world. This quantification of mutational load by assessing the functionally deleterious SNVs gave a clue for high prevalence of common CVDs in this region [58]. The observed higher mutational load for common CVDs than for Mendelian and congenital CVDs can be explained that common CVDs are polygenic where large number of deleterious variants with modest-to-weak effect contribute to them, whereas Mendelian CVDs are monogenic or oligogenic where few rare variants pose greater effect in the phenotype [6]. These modest-to-weak effect deleterious variants spread in the populations and raised in allele frequencies along with neutral variants during the rapid population expansion [59]. However, the allele frequencies of deleterious genetic variants contributing to certain human diseases may be different among populations, according to their historical modes of expansion, role of evolutionary forces, and bottlenecks. Highly deleterious variants are purged by purifying selection from the population and are rare [6,58,60]. So, more deleterious variants were observed in common DAF bins (>5%) for 1000 Genomes Project PJL than for ExAC SAS (Fig 5). This was further evaluated by calculating the proportions of rare-, low-and common-DAF deleterious SNVs for both data sets. The proportion of common-DAF deleterious variants was found to be 11.03% in 1000 Genomes PJL, while it was only 0.54% in ExAC SAS (S6 Fig). The higher proportion of common-DAF deleterious SNVs in 1000 Genomes Project PJL can also be explained by previous findings that the variants with very small detrimental effect for complex disorders can survive in populations for thousands of years without undergone purifying selection [61], or these contribute to late onset of diseases. Further, the genes contributing to Mendelian disorders are being under tight selection, while those contributing to complex disorders show interplay of negative and positive selection due to some balancing effect [62]. For example, the high frequency deleterious SNV rs2228570 (start lost of VDR) has been reported to contribute to hypertension [50, 51], and also protects from intervertebral disc degeneration [63]. This comparison also revealed that the proportion of rare-DAF SNVs was higher in deleterious pool than in total SNVs pool for both the data sets (S6 Fig). The comparatively higher proportion of rare SNVs in deleterious pool (62.80% in 1000 Genomes Project PJL and 98.63% in ExAC SAS) is consistent with earlier studies [64], and can be inferred in the light of population demography i.e., the genes involved in cardiovascular diseases have acquired such rare deleterious SNVs in the Pakistani population because of rapid population expansion in recent times [65]. The effect of neutral forces is further strengthened by the larger proportion of private deleterious SNVs (Table 3), because the most recently emerged SNVs also tend to be private in a population, and those population specific rare variants are even more likely to be deleterious for certain diseases [66].
The presence of ClinVar's pathogenic and likely pathogenic variants of CVDs in 1000 Genomes Project PJL and ExAC SAS also represents the underlying burden of these diseases in Pakistani population. The variants filtered in PJL and SAS were found to be associated with Mendelian and congenital CVDs only. The major proportion of filtered variants was related to cardiomyopathies (47.8%), long_QT syndrome (23.9%), cardiac arrhythmia (8.8%), and atrioventricular septal defects (5.0%). Among the 11 SNVs with higher allele frequency in SAS than in other populations ( Table 2), 8 were related to cardiomyopathies. In addition to SNVs, we filtered a 25-bp deletion (rs36212066) in intron 32 of MYBPC3 (cardiac myosin binding protein C), which was reported to be related with cardiomyopathies and present in populations of Indian origin with MAF~4% [67]. In this analysis, this deletion was found with MAF 3.1% in both the 1000 Genomes PJL, and ExAC SAS. In PJL, it was present in heterozygous form, while in ExAC SAS, 11 were in homozygous state and 495 in heterozygous state.
Owing to the current understanding that genetic burden of common diseases may be different for populations according to their past histories [58], we hypothesized that deleterious variants imparting their role in cardiovascular diseases in Pakistani population may had differentiated from South Asian populations in a more recent time. But our results from the pair-wise calculated F ST values were persistent with previous findings that variants contributing to common diseases are not well differentiated [68]. The two deleterious SNVs, rs560826688 (DAF 0.0312) and rs563254260 (DAF 0.0260) which are moderately differentiated from other South Asians, are also severely differentiated form all populations of 1000 Genomes Project. These correspond to LRP5 (which encodes Low Density Lipoprotein Receptor-Related Protein 5) and SERPINF1 (which encodes Pigment Epithelium Derived Factor (PEDF) belonging to Serpin Peptidase Inhibitors superfamily) respectively; and both contribute to hypertension [55,56]. Their evolution to comparatively higher frequencies in Pakistani population may be due to genetic drift having some bona fide effect masking their role in hypertension in this region [69]. The severely and highly differentiated SNVs from all 1000 Genomes Project populations (S5 Table) is also in accordance with the calculated higher burden of CVDs in Pakistan i.e., hypertension, atherosclerosis, heart failure, cardiomyopathy, and septal defects. Overall, comparatively less differentiation of deleterious SNVs was observed from South Asian, European and American populations (S5 Fig) representing the less evolution of genetic factors responsible for the susceptibility of cardiovascular diseases, while the observed high differentiation with African and East Asian populations represents their diversity or differential susceptibility to cardiac diseases, which is persistent with the influence of geography, language and ethnicity on genetic variation in those regions [70]. The PJL was also found grouped together with other South Asians, Europeans and Americans based on the genetics of cardiovascular diseases as carried out in this analysis (Fig 8). This paradigm also correlates with the route of expansion of modern humans after the migration from Africa. In future, the prioritized variants can be assessed and validated empirically by DNA sequencing of these genes in large cohort of relent cardiac patients.

Web resources
The URLs for data used and tools presented herein are: International