Skip to main content
Advertisement

< Back to Article

Fig 1.

Virus search from 3,332 WGS.

Heatmap shows read depth of seven viruses with abundant reads in at least one dataset and HERV-K113. The column colors show the human populations in the two databases. See Supplementary Fig 1 for the details of the names of the indicated populations. (1kGP: 1,000 Genomes Project; HGDP: Human Genome Diversity Project; AFR: African; AMR: American; EAS: East Asian; EUR: European; SAS: South Asian).

More »

Fig 1 Expand

Fig 2.

Chromosomal integrations of SMRV and HTLV-1.

A. Depth of WGS reads mapping to the primer binding site (PBS) of the SMRV-H genome. Seventeen datasets with at least one read mapping to PBS are shown. One dataset did not have any read mapping to PBS. The PBS of SMRV-H is shown with red characters. In all WGS datasets, the SMRV reads lack the guanosine present at the 468th nucleotide of the SMRV-H genome. B. Depth of HGDP01156 reads mapping to SMRV-H. Upper panel shows the depth of all reads in the dataset mapping to the SMRV-H genome. Lower panel shows the depth of reads mapping to the SMRV-H genome whose mate is not mapped to the SMRV-H genome. Virus genome structure is shown as gray bars. LTR are shown as dark gray rectangles. C. Mapping positions of SMRV-chromosome hybrid reads. Read-1 and Read-2 of a read-pair are connected with a line. All LTR-mapped reads are shown on the left LTR. The lower panel shows the predicted SMRV integration site on chromosome 10. Gray bar in the top of the upper panel represents the virus genome structure. Dark gray rectangles represent LTR. Reads mapping to the forward and reverse directions are shown as blue and red arrows, respectively. D. Detection of SMRV DNA from 1kGP LCLs by PCR. Genomic DNA extracted from the indicated LCLs were used as templates for PCR. WGS datasets from NA12399 and NA11920 are positive for SMRV, while that of NA18998 is negative. E. Depth of HG01918 reads mapping to HTLV-1. Upper panel shows the depth of all reads in the dataset mapping to HTLV-1. Lower panel shows the depth of reads whose pair is not mapped to the HTLV-1 genome. F. Mapping positions of HTLV-1-chromosome hybrid reads. Read-1 and Read-2 of a read-pair are connected with a line. The reads mapping to left LTR was kept when a read was multi-mapped to both left and right LTR. The genome position of reads mapping only to right LTR were replaced to the left LTR.

More »

Fig 2 Expand

Fig 3.

Detection and phylogenetic analysis of endogenous HHV-6.

A. Depth of reads mapping to HHV-6 in the 5 WGS datasets from 1kGP and HGDP. B. Phylogenetic trees inferred from U regions of HHV-6A and B. The publicly available sequences of endogenous and exogenous HHV-6 as well as ones reconstructed in the present study were used. C. Phylogenetic tree inferred from DR regions of HHV-6B. The publicly available sequences of endogenous and exogenous HHV-6B, as well as ones reconstructed in the present study, were used. B, C. Clade names defined in the phylogenetic analysis in Aswad et al. are shown.

More »

Fig 3 Expand

Table 1.

Summary of integrated HHV-6 identified from 1kGP and HGDP.

More »

Table 1 Expand

Fig 4.

HERV-K k-mer detection from 1,000 Genomes Project WGS.

A. Schematic representation of k-mer counting from WGS reads mapping to HERV-K113. The HERV-K113 genome was split into 50-bp bins with a 10-bp sliding window, then, sequences of the mapped reads corresponding to the HERV-K 50-bp bins were listed. The lower table shows the number of k-mers detected from 2,504 WGS datasets from the 1,000 Genomes Project. B. Hierarchical clustering of k-mers based on their frequencies in 26 populations. Heatmap shows the normalized k-mer count averaged over populations. K-mers arising from position 8401 to 8500 were excluded from this analysis to remove any potential k-mers arising from SVA retrotransposons. C. Clustering of presence-absence type k-mers by Pearson correlation coefficient. Clustering were performed by Ward’s method (upper heatmap) and DBSCAN (lower color-bar). The heatmap shows the Pearson correlation coefficient between k-mers, and the lower color-bar shows the clusters. Neighboring k-mer clusters are shown as either dark or light blue. Orange represents the k-mers which were not clustered. D. Correlation between the occurrence of HERV-K k-mers and previously reported HERV-K polymorphisms. Heatmap shows the Pearson correlation coefficient between the presence of k-mers and polymorphic HERV-K reported in the two previous studies. (C, D) Insets in the panel C and D shows that the occurrence of k-mers in cluster282 have high correlation to the presence of known polymorphic HERV-K in 19p12a. E. GWA using occurrence of k-mers detects known polymorphic HERV-K. Manhattan plots show SNVs with association to the occurrence of k-mers in the cluster282. SNVs with p-value lower than 8.33e-11 are shown as blue dots. Red solid line in the right panel shows the position of known non-reference HERV-K.

More »

Fig 4 Expand

Fig 5.

Genome positions associated with HERV-K k-mers.

The blue dots in the left clustermap show the genome positions with association to k-mer clusters by GWA analysis. The blue lines beneath the representation of the autosomes show the 79 genome loci associated with k-mer clusters. The dark green lines show the 8 known non-reference HERV-K on autosomes. The light green lines show the 7 k-mer cluster-associating genome loci overlapping with the known non-reference polymorphic HERV-K on autosomes. The dark orange lines show the 16 known reference polymorphic HERV-K on autosomes. The light orange lines show the 16 k-mer cluster-associating genome loci overlapping with the known reference absent HERV-K on autosomes.

More »

Fig 5 Expand

Fig 6.

K-mer-based method detects previously-unreported HERV-K polymorphism in subtelomere.

A. SNVs associating with k-mers in cluster352. The right panel shows the region near the end of the p-arm of the chromosome 4. SNVs with p-value lower than 8.33e-11 are shown as blue dots. The red solid lines in the right panel show two reference HERV-K solo-LTR in the subtelomere region. B. Presence and absence of k-mers of cluster352 in the public high-coverage short-read WGS of the Chinese trio. C. HG00733 contains non-reference provirus. Left panel shows the dot matrix between the reference human genome and a PacBio read of HG00733. The right panel shows the dot matrix between the reference HERV-K113 and the PacBio reads of HG00733. Light blue and light red rectangles represent HERV-K provirus and solo-LTR, respectively.

More »

Fig 6 Expand

Fig 7.

Potential interlocus gene conversion in HERV-K localized by LDfred.

A. Manhattan plot showing SNVs associating the k-mer cluster100. SNVs with p-value lower than 8.33e-11 are shown as blue dots. Green line shows the reference HERV-K provirus. B. Amplification of the HERV-K provirus by PCR. HERV-K provirus with adjacent sequence was amplified and PCR products were separated by gel electrophoresis. DNA extracted from LCLs originating from NA18998, NA18999, and NA12878 were used as templates. C. Upper panel: IGV view of long-read sequencing reads mapping to HERV-K. The PCR amplicons were sequenced using an Oxford Nanopore flongle flow cell and mapped to GRCh38. k-mers in k-mer detecting the HERV-K were also mapped to the PCR target regions. Lower panel: UCSC genome browser view showing the Multiz Alignment of 100 Vertebrates track. D. Enlarged images of panel C. NA12878 carries two alleles of a non-reference HERV-K haplotype (which is not observed elsewhere in the reference genome) also present as a single allele in NA18998 and NA18999.

More »

Fig 7 Expand