Comparative GWAS using global and Indian Reference Panels reveals non-coding drivers of COVID-19 severity and mortality

doi:10.1371/journal.pntd.0014020

Fig 1.

Clinical characteristics, Principal Component Analysis (PCA), and summary of GWAS findings in COVID-19.

A) Boxplots displaying clinical parameters (Age, SpO₂, Respiratory Rate and percentage of male and female patients) across phenotypic groups: ‘Control (Mild) vs Case (Moderate + Severe)’ (top row) and ‘Recovered vs Deceased’ (bottom row). All comparisons were made using the t-test in GraphPad Prism v8.0.2 (licensed version), and are statistically significant (p-value < 0.0001); B) PCA of COVID-19 genotyped samples: i) Ancestry PCA showing clustering with the South Asian (SAS) population and partial overlap with the East Asian (EAS), confirming expected ancestry; ii) PCA before outlier removal: Two outliers were found; iii) PCA after outlier removal: Outliers were removed, illustrating more homogeneous clustering across phenotypic groups; C) Summary of GWAS findings in COVID-19: i) Lollipop plot comparing the number of suggestive SNPs identified in two phenotype groups ‘Control (Mild) vs Case (Moderate + Severe)’and ‘Recovered vs Deceased’ across the two reference panels (1KGenomes and IndiGen); ii) Heat map illustrating the distribution and classification (risk or protective) of the suggestive SNPs across the phenotype subgroups and reference panels, highlighting differences in the variant contribution to the COVID-19 severity and outcome.

More »

Expand

Table 1.

Summary of clinical characteristics of patients in the study.

More »

Expand

Table 2.

Summary of top suggestive SNPs associated with COVID-19 severity and disease outcomes in the 1KGenomes and IndiGen imputed data.

More »

Expand

Fig 2.

GWAS results for COVID-19 severity and mortality using 1KGenomes and IndiGen-imputed datasets.

A) Results from the ‘Control (Mild) vs Case (Moderate + Severe)’ comparison group using 1KGenomes and IndiGen-imputed data, showing Manhattan plots (right) and corresponding QQ plots (left); B) Results from the ‘Recovered vs Deceased’ group, with Manhattan plots (right) and QQ plots (left). In each Manhattan plot, the -log₁₀ (p-values) of SNP associations are plotted against the genomic position, with alternating chromosome colours. The horizontal line marks the suggestive significance threshold (p-value = 1 × 10^-5). The QQ plots depict observed versus expected p-value distributions, with deviation from the diagonal red line indicating potential true associations. Notably, both reference panels showed signs of potential genetic associations, with some phenotypes displaying more significant and clustered signals, indicating stronger links to the COVID-19 outcomes.

More »

Expand

Fig 3.

Genomic landscape of SNPs in strong LD with suggestive SNPs associated with COVID-19 severity and outcome using 1KGenomes and IndiGen-imputed data.

This figure illustrates the regional plots of SNPs exhibiting strong LD (r² ≥ 0.8) with suggestive SNPs identified from GWAS in two phenotypic comparisons, A) Control (Mild) vs Case (Moderate + Severe), and B) Recovered vs Deceased. Results are displayed separately for the 1KGenomes and IndiGen reference panels. Each sub-panel highlights the genomic region surrounding a suggestive SNP (green triangles), with nearby SNPs in strong LD (red triangles). The x-axis indicates the genomic position (base pair), and the y-axis represents the -log₁₀(p-value) from the association tests. This figure highlights the added value of using a population-matched reference genome (IndiGen) in capturing more LD structure accurately, thereby improving GWAS resolution in underrepresented populations. These regional plots also reveal loci that may harbour causal variants influencing COVID-19 severity and survival, underscoring the importance of fine-mapping in diverse populations.

More »

Expand

Fig 4.

Genomic landscape of suggestive SNPs in Control (Mild) vs Case (Moderate + Severe).

Genomic context of suggestive variants associated with COVID-19 severity ‘Control (Mild) vs Case (Moderate + Severe)’ from 1KGenomes and IndiGen panels: Genomic regions surrounding lead suggestive SNPs identified in the ‘Control (Mild) vs Case (Moderate + Severe)’ GWAS are shown, highlighting the nearby genes and non-coding elements. Panels (top to bottom) display rs35575084, rs17024964, and rs34607367 identified using the 1KGenomes panel, located near MIR4432HG and MIR4432 lncRNA regions on chromosome 2; and rs10096505, a severity-associated suggestive SNP detected only with IndiGen-imputed data, located near BMP1 and SFTPC on chromosome 8.

More »

Expand

Fig 5.

Genomic landscape of suggestive SNPs in Recovered vs Deceased.

Genomic context of suggestive variants associated with COVID-19 outcomes (‘Recovered vs Deceased’) from 1KGenomes and IndiGen panels: Regional gene maps surrounding suggestive SNPs associated with mortality outcomes across reference panels are shown. Panels (top to bottom) showcase rs9547631 and rs1850535, which were identified using 1KGenomes, near SERTM1 and LOC102723049 on chromosome 13. rs78554880, rs112982286, rs111390553, and rs79900659 were uniquely detected in IndiGen-imputed data, clustered on chromosome 2 near lncRNAs such as AC116609.1 and LINC01115. The rs9547631 was also replicated in the IndiGen-imputed data, reinforcing its potential as a cross-population mortality-associated variant. Each plot marks the SNP (red-dashed line) and visualises adjacent genes categorised by biotype. These findings highlight distinct regulatory loci identified through population-specific reference panels.

More »

Expand

Fig 6.

Summary of the key genomic loci identified from comparative GWAS points towards plausible regulatory mechanisms linked to COVID-19 severity.

The study stratifies patients based on the COVID-19 symptoms and clinical outcomes, and performs genotyping followed by imputation using both a global (1KGenomes) and an indigenous (IndiGen) reference panel. Post-imputation, GWAS identified population-specific variants linked to the COVID-19 severity ‘Control (Mild) vs Case (Moderate + Severe)’ and mortality ‘Recovered vs Deceased’. Key loci, such as rs35575084 near MIR4432HG linked to the endothelial dysfunction, rs10096505 associated with the alveolar collapse and fibrotic remodelling, rs9547631 potentially implicated in cell signalling, and rs78554880 linked to splenic immune dysfunction, are highlighted. The approach reveals how genetic diversity across populations may shape different host responses to SARS-CoV-2 infection. Created using a licensed version of BioRender.com in the lab.

More »

Expand

Fig 7.

Schematic workflow illustrating comparative GWAS with 1KGenomes and IndiGen reference panels.

The pipeline begins with patient stratification based on the COVID-19 symptoms, followed by genomic DNA extraction, amplification, and genotyping. Post-genotyping quality control includes checks for missingness, sex discrepancies, HWE, relatedness, and MAF. Unphased genotype data are then phased, and outlier samples are identified via PCA. Imputation is performed separately using the 1KGenomes and IndiGen reference panels via BEAGLE5.5, and filtered based on imputation accuracy (DR² ≥ 0.8). Downstream association analyses identify population-specific loci and variants in strong LD, which are further functionally annotated using eQTL and gene mapping tools. This approach enables cross-panel comparisons of the genetic architecture associated with COVID-19 severity and outcomes. Created using licensed version of BioRender.com in the lab.

More »

Expand