ACE2 polymorphisms as potential players in COVID-19 outcome

The clinical condition COVID-19, caused by SARS-CoV-2, was declared a pandemic by the WHO in March 2020. Currently, there are more than 5 million cases worldwide, and the pandemic has increased exponentially in many countries, with different incidences and death rates among regions/ethnicities and, intriguingly, between sexes. In addition to the many factors that can influence these discrepancies, we suggest a biological aspect, the genetic variation at the viral S protein receptor in human cells, ACE2 (angiotensin I-converting enzyme 2), which may contribute to the worse clinical outcome in males and in some regions worldwide. We performed exomics analysis in native and admixed South American populations, and we also conducted in silico genomics databank investigations in populations from other continents. Interestingly, at least ten polymorphisms in coding, noncoding and regulatory sites were found that can shed light on this issue and offer a plausible biological explanation for these epidemiological differences. In conclusion, there are ACE2 polymorphisms that could influence epidemiological discrepancies observed among ancestry and, moreover, between sexes.


Introduction
At the end of 2019, a new outbreak caused by SARS-CoV-2 (a coronavirus) started in Hubei Province, China. The clinical condition, COVID-19, probably arose from natural selection in bat reservoirs [1]. There are currently more than five million cases worldwide, and the pandemic has been increasing exponentially in many countries since the disease was deemed a pandemic by WHO in March 2020 [2].
The lethality rate is influenced by the speed of contagion, idiosyncrasies of the affected populations according to the containment policies adopted, socioeconomic conditions and the absorption limit of the health system [3]. Due to the high rate of transmission of the virus by air and the novelty of the infection to humans, the disease has become a global emergency a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 problem, forcing periods of social confinement to contain the pandemic, in addition to hygiene habits [4].
Although mortality in men is significantly higher [5], representing an increase from 20% to 70% in European countries, approximately 65% in some Asian countries, and, even more peculiarly, Dominican Republic citizens showed three times more deaths in men than women [2,6,7]. It is unknown whether this is due to biological differences between genders, differences in behavioral habits, or comorbidities rates [8].
Studies indicate that cell-virus interaction is mediated by the connection of the transmembrane glycoprotein spike (S), present in the form of trimers on the viral surface, to angiotensin I-converting enzyme 2 (ACE2, also called hACE2) [9].
The level and expression pattern of ACE2 in different tissues and cells can be critical to the susceptibility and symptoms resulting from SARS-CoV-2 infection [10]. Zhou and collaborators [11], using scRNA-seq datasets, classified organs vulnerable to infection as high and low risk based on their expression levels of ACE2. At the clinical level, the symptoms of COVID-19 may be related to the entry and affinity of the virus in these organs, as observed in heart failure disease and increased ACE2 expression, in which viral infection is related to a higher risk of heart attack and worse ill condition [12]. For Li and collaborators [13], ACE2 genetic variations could be crucial to the susceptibility in different cohorts and to clinical outcomes of  Currently, investigations of potential genetic variations that may favor or hinder interactions between the virus and the host have been conducted [14][15][16], showing a high number of codons that can, if altered, interfere with the complexity of the virus-cell interaction, as experimentally demonstrated [17]. It is noteworthy that the ACE2 is located on the X chromosome, causing the impossibility of heterozygosity in men. Therefore, polymorphisms in their single copy could be related to the worst outcomes observed in males [18].
Considering the above, we sought explanations for an intrinsic factor that differed between sexes and populations that may justify the differences observed in the incidence and lethality of SARS-CoV-2 infection among the different regions of the world, as well as between sexes. We analyzed global data in the 1000 Genomes Database and, in addition, we conducted studies of exomes in two population groups in the Brazilian Amazon (Indians and miscegenated), without description in public genomic banks, and we compared this information with a public databank from a population in southeastern Brazil. These comparisons are important because Brazil has a continental size and an admixed population in the North (more Amerindians among all regions), Northeast (more Africans), and South and Southeast (more Europeans) [19].

Analyses in the 1000 Genomes project
The analysis was performed on data from the 1000 Genomes Phase 3 database (1000G), which comprises 84,4 million variants in 2,504 individuals from 26 different populations [20]. These populations were concentrated in five large groups: African (AFR), Ad Mixed American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS).
Through the complete sequence of the X chromosome, a region between nucleotides 15620281 and 15512643 was selected in revision GRCh37.p13 (15602158 and 15494520 in GRCh38.p13) since the angiotensin I-converting enzyme 2 gene (ACE2, Gene ID: 59272, updated on 22-Mar-2020) is located on the complementary strand, covering 107639 bp. Additionally, a region of ten thousand base pairs upstream to the gene was included in the analysis so that the initial search site became 15630281 (GRCh37.p13) [20,21]

Analyses in Amazon natives and admixed population
For the allelic comparison between the populations cataloged in the 1000 Genomes database and the population not described in the respective project, we investigated a population composed of 64 Amerindians and 82 admixed individuals from the Amazon region of northern Brazil (data available at S4 Table). This study was approved by the National Committee for Ethics in Research (CONEP) and the Research Ethics Committee of the UFPA Tropical Medicine Center under CAAE number 20654313.6.0000.5172. The participants signed an informed consent form. The Amerindians represent 10 different Amazonian ethnic groups that were grouped together as the Native American (NAM) group. Tribe names and geographic coordinates of the Brazilian Amazon Indian populations are presented in S5 Table. Compared with other Native Americans (from different countries and even different regions of Brazil), the Amazonian indigenous groups are similar to each other. Commonly, there are quite differences between distinct ethnic groups of native Americans. However, in comparison to populations of other ancestry backgrounds (Europeans or Africans), the Native Americans of the Brazilian Amazon may be grouped as a homogeneous population [22][23][24].
The 82 admixed individuals (Brazilian Admixed Population-BAP) live in Belém city, located in northern Brazil, where, due to the colonization process, are characterized by three ancestral genetic components: European, Native American and African. This sample group is also enrolled in a broad project. Furthermore, we also compared our findings to a database of variants analyzed in a Southeast Brazilian population, the Online Archive of Brazilian Mutations (ABraOM, we represent here as ABM) [25].
The Brazilian population is one of the populations with the broadest genetic diversity in the world due to the high degree of miscegenation in its formation [26][27][28]. Different regions of Brazil have significant genetic variations in the contribution of their ancestry. The population of southeastern Brazil, which forms the ABM database, has a high contribution from European ancestors [29]. Our data (BAP) were generated based on the population of Northern Brazil of formation with high Amerindian ancestry, resulting in relevant genetic differences between regions [30].

DNA extraction and Exome library
The DNA was extracted as described by Sambrook and collaborators [31]. The genetic material was quantified using a Nanodrop spectrophotometer (Thermo Fisher Scientific Inc., USA). The libraries were prepared using the Nextera Rapid Capture Exome (Illumina) and SureSelect Human All Exon V6 (Agilent) Kits. Sequencing reactions were run using the NextSeq 500 High-output v2 300 Cycle Kit (Illumina1, USA) on the NextSeq 5001 platform (Illumina1, USA).

Databank analysis
Information from the databank and exomes were analyzed using descriptive statistics, considering the values of allele frequencies of populations and subpopulations as the data explored comparatively. Genotypic differences between sexes in the homo/hemizygous state were calculated based on the premise that populations are in Hardy-Weinberg equilibrium.

Results
Analyzing the polymorphisms contained in the ACE2 locus, in addition to ten thousand base pairs upstream, we found 2266 polymorphisms, of which 199 were contained in the region 5ú pstream of the gene, 85 were located in exonic regions, and the others were located in the introns.

Polymorphisms in exonic regions that might influence ACE2 protein structure
In the exonic region, 15 SNPs of the 85 polymorphisms found in 1000G present differences greater than 1% between some of the populations (all between exons 17 to 21). Another three polymorphisms (rs889263894, rs1027571965, rs147464721) appear mainly in the Brazilian population. Nine of these exonic polymorphisms are present in the most common isomorphs (v1 and v2) and show important differences in populational frequency (Fig 1, additional data on S1 Table).
As noted, some of these polymorphisms have very significant differences between MAFs from different populations. rs35803318 is absent in Asians (virtually absent in AFR) and has an average MAF of 0.05 in EUR and ABM, increased to 0.074 in BAP, and has the highest allele frequency in the indigenous population among all populations (MAF = 0.121).
On the other hand, rs4646179 is absent in indigenous, miscegenated people from the Amazon, Asians and Europeans and is found in the population of southeastern Brazil and Americans, with a MAF = 0.023 and an even greater frequency in Africans (MAF = 0.074).
Interestingly, rs1027571965 and rs889263894 presented allele frequencies exclusively in indigenous people and are not being found in any other world population of 1000G, neither in BAP nor in the Brazilian ABM database, with MAF = 0.095 and 0.034, respectively. rs147464721 has MAF = 0.014 in the miscegenated population of the Amazon (BAP); however, they did not present any allelic frequency among the indigenous population, as well as in Asians and Europeans, presenting a slightly lower MAF difference when compared to AFR and AMR.

Polymorphisms in upstream regions that might influence in ACE2 expression
Differences greater than 1% of MAF in the region 10,000 base pairs 5´upstream to the gene were observed in 57 polymorphisms (Fig 2). It is also worth mentioning that the rs2097723 SNP presents a very heterogeneous distribution, oscillating between 7% in Africans, 32% in Americans, 42% in East Asians, 28% in Europeans and 22% in South Asians.

Polymorphisms in intronic regions that might modify ACE2 regulation
In intronic polymorphisms, many of them present a very relevant MAF interpopulational difference (up to 0.46). Two of them deserve to be highlighted, rs2285666 and rs4646140, because they are near exons.
It is important to mention that rs2285666 has the highest frequency of the rarest allele (MAF = 0.71) in the indigenous population, with very important MAF differences of 0.17 (EAS), 0.23 (SAS), 0.36 (BAP), and 0.37 (AMR) and an average difference of 0.48 for the others (EUR, AFR and ABM). In contrast, rs4646140 has a MAF ranging from zero in Indians to 0.13 in Africans through EUR, AMR, BAP, EAS, ABM and SAS (Fig 3). Furthermore, considering the possibility of influence in determining isoforms v2, it is worth noting rs190614788 on intron 1 (with a difference of more than 0.11 between EUR and EAS).
The main findings of our study are concatenated in Table 1 (additional information in S3  Table), and they are discussed below.

Polymorphisms of the ACE2 gene related to the protein binding region to the viral particle
Considering the regions where the virus commonly binds to ACE2 [15,16], our results point to the absence of relevant polymorphisms at these sites because many of those located in coding regions have MAFs close to or less than 0.001, as well as a very low possibility of conferring any global (or population) impact on the destination of the disease. Therefore, there does not seem to be any direct mechanism (at the sites of interaction) that could confer some form of resistance or greater propensity to contagion in humans. In view of this finding, other molecular modifications with potential functional repercussions were investigated, including 1) changes in sites distant to the viral binding locus, but which may bring about some structural protein change with the potential to influence the cell-virus interaction; 2) changes in translation regulation regions at 3´UTR sites, at points of interaction with miRNA; and 3) modifications in transcription regulation zones in 5´upstream regions and intragenic promoters.

Polymorphisms with the potential to cause changes in the protein structure of ACE2 that may impact virus-cell interactions
The ACE2 gene is mainly composed of two isoforms with 18 or 19 exons (v1 and v2) that encode the same protein (805 amino acids) and three other smaller variants: x1, x2 and x3, which have rarely been studied [21,33]. Thus, we searched for genotypic information in these exons that could allow us to infer a disruption with the potential to culminate in impacts on the disease process.
Fifteen SNPs showed MAF differences greater than 1% among the studied populations, mainly belonging to exons 17, 18, 19, 20 and 21; the last two are terminal exons belonging only to rare isoforms [21].
Among these, rs41303171 is a missense SNP, causing the replacement of an asparagine (neutral amino acid) with aspartic acid (electronegative amino acid) at codon 720, which can culminate in a conformational disorder of this protein that, directly or indirectly, can change viral interactions. This polymorphic variant C is practically exclusive to Europeans (MAF = 0.018), a fact corroborated by Cao and collaborators [10], mainly in British individuals (MAF = 0.03). Across Asia, this allele is not found, and is only present in a single Asian resident of the UK [20]. In Brazil, ABraOM data point to MAF<0.01. Thus, the possible biological implications of this change may have some clinical-epidemiological consequences in a small niche of European patients with COVID-19 when compared to other regions of the world.
The change from leucine to phenylalanine in codon 731 (rs147311723) results in the exchange of two nonpolar amino acids that are structurally different (with the presence of an aromatic ring in this last), which may culminate in functional modifications in the ACE2 protein, with the prediction that this will occur equally to 0.941, according to the PolyPhen algorithm [32]. This polymorphism is absent in Asians and Europeans, with low frequency in Americans and Southeast Brazilians, and it has MAF = 0.017 in Africans, mainly in Nigerians, with MAF = 0.043 of allele A. In a study by Cao and colleagues [10], this polymorphism was described as a low frequency SNP in the 1000G database, but without any mention in the China Metabolic Analysis Project (ChinaMAP) database, reinforcing its absence in this population group.
In a context focused on the Brazilian Amazon, unprecedented data showed the presence of two SNPs absent in all populations of the 1000G. One of them is rs1027571965, which is characterized by an exchange of G>C in exon 16, leading to a substitution of alanine for glycine at codon 673 (MAF = 0.095), and the other, rs889263894, an exchange of T>A in exon 13 (MAF = 0.034), causing an alteration of lysine to isoleucine in codon 541; thus, this last SNP should cause structural differences by the exchange of a polar and electropositive amino acid with a hydrophobic amino acid, possibly resulting in functional changes in ACE2, with a probability of 0.958 that this event will occur [32]. Both SNPs had uncertain significance until now.
It is also noteworthy that in codons 351 (rs147464721), 690 (rs4646179) and 749 (rs35803318), there are just synonymous alterations, without any study of modifications of this enzyme in a genotype-dependent manner.

Polymorphisms with the potential to accentuate ACE2 gene expression or translation that may impact virus-cell interactions
Our data point to a series of polymorphisms in the region upstream of the ACE2 gene, which oscillate markedly among populations and, therefore, in individuals, although some of them may not have relevant frequency at a global level. Among them, there are rs2097723 and rs5934250, with population allele differences of up to 0.35 and 0.47, respectively.
The lower frequency allele (C) of rs2097723 has a normalized effect size (NES) of up to 0.36 [34] for increased expression of the ACE gene in brain tissue. The T allele of rs5934250 has an NES of 0.64 for lower ACE2 expression in this tissue, among others [34]. Thus, when looking at population data, it can be inferred that, according to these two variants, populations in East Asia would have the most related scenario regarding increased gene expression. Europeans or Africans have a lower frequency of ACE2 higher-expression-related SNPs in the pre-genic region.
Another interesting observation involves the exon 19 polymorphisms, which are contained in the 3´UTR region of the two most important isoforms, the canonical miRNA binding site for translation control dependent on this epigenetic mechanism [21,35]. In this regard, observing the regions of interaction between the miRNA and nucleotide exchange sites, it can be noted that rs182366225 is included in the site of attachment to the seed region of miR-140-3p.1 and miR-483-3p.2, both with a match of 7mer-1A and Context ++ score percentile of 90 and 74, respectively. Thus, the proportion of translation regulation that depends on these miR-NAs will be upregulated in the East Asian population, especially in Vietnamese and Chinese individuals, who have an average MAF of 0.032 of the C allele, which is not observed in any other population in the world [36]. A reporter assay for miR-483-3p predicted targets by Kemp and collaborators [37] showed that the regulation of ACE2 is dependent on this microRNA.
The rs142017934 site includes the connection site for miR-610 and miR-3646, both with 8mer of match in ACE2 (context ++ score percentile of 98 and 77, respectively), in addition to the seed regions of miR-3609 and miR-548ah-5p, both with a match of 7mer-m8 and scores of 84 and 74, respectively [35]. This polymorphism occurs exclusively in the population of African origin (MAF = 0.013), mainly in Nigerians and individuals from Barbados with African ethnicity, with an average of MAF = 0.026. This higher frequency among people of African ethnicity residing on different continents is probably due to the strong population ancestry of Barbados being from West Africa, a region containing Nigeria [38].
Among the intronic polymorphisms, rs2285666 draws much attention because it presents the highest frequency of the rarest allele (MAF = 0.71) in the indigenous population, with differences ranging from 0.48 to 0.17 for the other populations, with the Asians being the most similar to the Indians, mainly the Chinese. The high MAF observed in the Chinese population in the present study corroborates the data of Cao and collaborators [10] using the ChinaMAP database. The substitution of C for T in intron 4, to only four nucleotides of exon 3 (located in the splicing region), influences the gene expression in brain tissues and tibial nerve in some way so that the T allele is related to a significant increase in the expression of ACE2 [34], thus may be a determinant in clinical differences in this naturally more vulnerable population.
Another intronic polymorphism, rs4646140, has no MAF in the indigenous population, reaching 0.13 in Africans, mainly in Nigerians (0.17). Some studies show the influence of these two intronic SNPs with hypertension [39,40], since there is a possibility that they will interfere in the ACE2 protein product.
In conclusion, considering this genetic aspect involving ACE2 in the complex relationship between SARS-CoV-2 and humans, we emphasize the following: There are genetic markers that influence in ACE2 alterations and potentially to the unequal rates of aggravation and death observed between men and women. If considering some of the polymorphisms as harmful when in homozygosity (genotypic frequency equal to allelic frequency squared) or hemizygosity (genotypic frequency equal to allelic frequency), men would have this conditions from 1�4 to 77 times more (uniallelic) than women (Table 1). Thus, a relevant contribution to the understanding of higher mortality in males is presented, as reported in the various populations affected by COVID-19.
The rates of contagion and death fluctuate greatly; in this sense, ACE2 polymorphisms could contribute to these differences. The rs182366225 and rs2097723 polymorphisms that potentially may increase the expression of the ACE2 are more frequent in the East Asian population. These allele frequencies are even higher in Chinese and Vietnamese populations. Such markers are on the order of 30% to 180% more frequently in East Asians than in other populations.
Indigenous populations from Amazon have exclusive genetic polymorphisms (rs1027571965 and rs889263894) or with higher frequencies (rs2285666 and rs35803318) than other populations. These polymorphisms are related to increased expression of the ACE2 gene in brain tissues, among others. As supported previously [10][11][12], this is a relevant finding because they can input some influence on the outcome in these populations, whose involvement by COVID-19 was recently reported. This population group, due to its genetic peculiarity and less previous exposure to viral infections, represents a major challenge in understanding and handling this pandemic.
Africans have higher rates of three relevant polymorphisms (rs147311723, rs142017934 and rs4646140). Polymorphism in rs142017934 is exclusive to this population and can influence the translation regulation of the ACE2 gene, thus enhancing the expression of this gene in individuals who carry this variant. However, Europeans and some Africans have a higher frequency of an allele (rs5934250) that seems to reduce the expression of ACE2 in some tissues.
Therefore, the study highlights the importance of considering this genetic factor as facilitating or restricting infection and, especially, in potential clinical manifestations and outcomes. Investigation of these polymorphisms in patients affected by COVID-19, with different clinical conditions and outcomes, could potentially advance our understanding of this pandemic disease.
Supporting information S1 Table. The