Table 1.
Genome-wide SNP and INDEL load comparison in VIKING vs LBC (number of alleles per individual).
Fig 1.
Significant differences in variant load in coding and related regions for ultra-rare (upper panel) and very rare (lower panel) variants.
Circle dots represent the ratio of the median number of variants in a VIKING individual to the median number of variants in an LBC individual; whiskers are 95% CI based in 10,000 randomly selected LBC subsets (n = 269, with replacement). Significance: at least 95% of the 10,000 subsets have p-value ≤ 8x10-4 (Bonferroni corrected) and no overlap between the 95% CI for the LBC and the VIKING median values (for full results see S4 Fig). The higher variance in the 5’UTR and lower variance in ncRNA regions could be explained by their relatively small sizes– 9.3Mb and 7.3Mb, respectively.
Fig 2.
Ultra-rare SNP variant loads in functionally annotated non-coding regions.
Circle dots represent the ratio of the median number of variants in a VIKING individual to the median number of variants in an LBC individual; whiskers are 95% CI based in 10,000 randomly selected LBC subsets (n = 269, with replacement). Significance: at least 95% of the 10,000 subsets have p ≤ 2x10-4 (Bonferroni corrected) and no overlap between the 95% CI for the LBC and the VIKING median values. The red vertical lines represent the median genome-wide load for ultra-rare SNPs and its 95% CI. The higher variance in the Insulator regions estimates could be explained by their relatively small size (17.4Mb). Gm12878: B-lymphoblastoid cells, H1hesc: embryonic stem cells, Hepg2: hepatocellular carcinoma cells, Hmec: mammary epithelial cells, Hsmm: skeletal muscle myoblasts, Huvec: umbilical vein endothelial cells, K562: erythrocytic leukemia cells, Nhek: normal epidermal keratinocytes, Nhlf: normal lung fibroblasts, union: an aggregated comparison between the two cohorts for this chromatin state by considering the union of state’s regions annotated in any of the 9 cell types.
Table 2.
Variants observed in the VIKING cohort but not in gnomAD are often specific to Shetland.
Table 3.
Tajima’s D captures demography and suggests relaxation of purifying selection in VIKING.
Fig 3.
Distribution of ultra-rare SNPs in functional regions.
Fraction of ultra-rare variants (FUV) = number of ultra-rare variants / (number of ultra-rare + known variants); Values for regulatory regions are computed as the average over the 9 cell types; non-coding = mappable genome– 5’UTR–exon–intron– 3’UTR–ncRNA; Coloured horizontal and vertical lines represent the genome-wide averages for the two cohorts. Dashed black lines represent the distribution shifts between LBC and VIKING for each of the considered genomic regions. A strictly vertical shift would indicate a proportional increase in the numbers of ultra-rare and known variants from LBC to VIKING, whereas a strictly horizontal shift (no change in the ultra-rare variant density between the two cohorts) would represent a decrease in the number of known variants in VIKING.
Fig 4.
Allelic shift bias (ASB) suggests loss of constraint at VIKING exonic and promoter regions.
MAF shifts for very rare SNPs (MAFNFE ≤ 1%) between non-functional intergenic regions (NFIG), considered as baseline, and non-synonymous SNPs in exonic regions, SNPs with CADD score ≥ 10 in promoter regions and intronic SNPs, for each of the cohorts. These MAF differences are calculated using 1000 randomly selected LBC subsets of size 269 individuals (matching the VIKING size; with replacement) and considering only variants shared between the VIKING and the currently evaluated LBC subset, for which we computed the cohorts’ mean MAF in exonic, promoter, intronic and non-functional intergenic regions (see S10 Fig). Black horizontal lines represent mean values. The differences in MAF shifts in the two cohorts are statically significant for all three comparisons (p < 2.2x10-16, one-sided Wilcoxon rank sum test).