Most frequent South Asian haplotypes of ACE2 share identity by descent with East Eurasian populations

It was shown that the human Angiotensin-converting enzyme 2 (ACE2) is the receptor of recent coronavirus SARS-CoV-2, and variation in this gene may affect the susceptibility of a population. Therefore, we have analysed the sequence data of ACE2 among 393 samples worldwide, focusing on South Asia. Genetically, South Asians are more related to West Eurasian populations rather than to East Eurasians. In the present analyses of ACE2, we observed that the majority of South Asian haplotypes are closer to East Eurasians rather than to West Eurasians. The phylogenetic analysis suggested that the South Asian haplotypes shared with East Eurasians involved two unique event polymorphisms (rs4646120 and rs2285666). In contrast with the European/American populations, both of the SNPs have largely similar frequencies for East Eurasians and South Asians, Therefore, it is likely that among the South Asians, host susceptibility to the novel coronavirus SARS-CoV-2 will be more similar to that of East Eurasians rather than to that of Europeans.


Introduction
The novel coronavirus SARS-CoV-2, the causative agent of the ongoing pandemic of COVID-19, today presents one of the major challenges to humanity [1]. Recent studies have effectively demonstrated that the Angiotensin-converting enzyme 2 (ACE2) encoded by a gene located on the X-chromosome is the host receptor for the virus [1,2]. A decreased level of ACE2 expression mitigates the severity of the disease. The over-expression or a unique genetic polymorphism of the receptor among Asians have been ruled out in a recent study [3,4]. ACE2 also maintains cardiovascular homeostasis and electrolyte balance and protects against lung injury by acid aspiration [5]. A comprehensive understanding of ACE2 variations among various ethnic groups has hitherto been largely unknown. The South Asia subcontinent harbours diverse and endogamous ethnic groups [6]. Most of the genomes of South Asia are autochthonous but show a considerable amount of sharing with East and West Eurasia [7]. However, when we compare overall genome sharing with East vs. West Eurasia, South Asians show greater genetic affinity with West Eurasia [8][9][10]. The only exception is Tibeto-Burman speaking populations, who share a large amount of ancestry with East Eurasia [11]. The genetic structure of ACE2 haplotypes among South Asian populations is not known. Therefore, we have analysed the whole genome data of South Asians with respect to various world populations for ACE2 published elsewhere [12,13] (S1 Table).

Materials and methods
The research has been approved by the Institutional Ethical Committee of Banaras Hindu University, Varanasi, India. To analyse the ACE2 among various populations, we have extracted the sequences from the published datasets [12,13], by using PLINK 1.9 [14]. It has been shown that the 1000 genome dataset for South Asia does not capture the complete South Asian variation, mainly due to unsampled Austroasiatic populations [15]. Hence, we analysed Pagani et al. [12] by way of primary data and further confirmed the results with the 1000 genome data [13]. We extracted 447 samples designated as a diversity set panel in the Pagani et al. data [12]. After excluding samples from Africa, Sahul and relatives up to the second degree, we used 393 samples in all our analyses (S1 Table). A total of 248 polymorphisms were observed in the Pagani et al. data [12] (S2 Table). LD maps for each of the groups were analysed from Haploview [16] (S1 Fig). For both of the datasets, we converted plink file to fasta file (ped to IUPAC) from customised script. Phasing of the data, the calculation of population-wise genetic distances, and Arlequin and Network input files were generated by DnaSP v 6 [17]. The neighbour joining (NJ) tree was constructed by MEGA-X [18] (Fig 1A). Nei's genetic distances and pairwise differences were calculated from Arlequin 3.5 [19] and plotted by R v 3.1 [20] (Fig 1B  and S2 Fig). Network v5 [21] and Network publisher were used to construct the median joining (MJ) networks (Fig 2 and S3 Fig). The spatial map of rs4646120 and rs2285666 were drawn from PGG toolkit (S4 Fig) [22].

Result and discussion
Our pooled data have yielded 248 high quality polymorphisms (S2 Table). In the LD (linkage disequilibrium) plot analysis, significant LD blocks of different sizes were present among Caucasus, Central Asians, South Asians, mainland Southeast Asians, insular Southeast Asians and Siberians (S1 Fig). Europeans showed the lowest level of LD. We have used a haplotype based approach for the comparison. In contrast with the genome-wide analysis [8][9][10], the NJ (Neighbour Joining) tree based on Fst distances clustered South Asians together with insular and mainland Southeast Asian populations (Fig 1A). This unexpected result suggested closer a genetic affinity of South Asians with East Eurasians for ACE2. The pairwise difference analysis suggested lower diversity for South Asian, Southeast Asian and Siberian populations (Fig 1B). Similarly, the 1000 genome populations showed the lowest diversity for East Asian populations (S2 Fig).
The phylogenetic analysis of various haplotypes among studied populations helped to identify the SNPs responsible for the affinity of South Asians with East Eurasians (Fig 2 and S3  Fig). Three major distinct haplotypes were observed. Haplotype 1 (ht1) was more common in West Eurasians, including Central Asian populations, whereas haplotype 2 (ht2) was frequent among East Eurasians, South Asians and Americans (Fig 2 and S3 Fig). Haplotype 3 (ht3) was harboured mainly by East Eurasians and South Asians. The haplotype 2 (ht2) originated from SNP rs4646120, whereas ht3 was derived from SNP rs2285666. Phylogenetically both of these Life Sciences Pvt Ltd. India provided support in the form of salaries for author AR. The specific roles of this author is articulated in the 'author contributions' section. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

PLOS ONE
Competing interests: One of our author AR is full time employee of Redcliffe Life Sciences Pvt Ltd. India. This does not alter our adherence to PLOS ONE policies on sharing data and materials. SNPs play a key role in the distinction between East and West Eurasian populations (Fig 2 and  S3 and S4 Figs). Interestingly, the most frequent haplotypes of South Asia involve these SNPs.
A recent study has also highlighted the highest frequency of this SNP (rs 2285666) among Chinese populations (0.5) as well as significant frequency differences among 1000 genome populations (S4 Fig) [4]. In our study, we also found high frequency (0.6) of this SNP among South Asians (S2 Table and S4 Fig). Moreover, we also found that a synonymous coding region variant rs35803318 was most frequent among Americans (0.15), followed by Europeans (0.055), Caucasians (0.051) and Central Asians (0.021), whilst this site was not polymorphic for West Asians, South Asians, Southeast Asians and Siberians (S2 Table). Phylogenetic analysis has suggested that the majority of South Asian samples share with East Eurasians the monophyletic haplotypes 2 and 3 by the unique polymorphism events (rs4646120) and (rs2285666). Recent studies have suggested that the reference allele has a reduced ACE2 expression of up to 50%, resulting in greater severity of a SARS-CoV-2 infection [23][24][25]. Additionally, a synonymous coding region variant rs35803318 was also significantly more polymorphic among Americans and Europeans than among South Asians. Hence, it is likely that among South Asians, the host susceptibility to the novel coronavirus SARS-CoV-2 more closely resembles that of East/Southeast Asians rather than that of Europeans or Americans.