Tumor-specific changes in Kaposi sarcoma-associated herpesvirus genomes in Ugandan adults with Kaposi sarcoma

Intra-host evolved tumor virus variants have provided insights into the risk, pathogenesis and treatment responses of associated cancers. However, the intra-host variability of Kaposi sarcoma-associated herpesvirus (KSHV), the etiologic agent of Kaposi sarcoma (KS), has not been explored at the whole viral genome level. An accurate and detailed description of KSHV intra-host diversity in whole KSHV genomes from matching tumors and oral swabs from Ugandan adults with HIV-associated KS was obtained by deep, short read sequencing, using duplex unique molecular identifiers (dUMI) – random double-stranded oligonucleotides that barcode individual DNA molecules before library amplification. This allowed suppression of PCR and sequencing errors down to ∼10−9/base. KSHV genomes were assembled de novo, and identified rearrangements were confirmed by PCR. 131-kb KSHV genome sequences, excluding major repeat regions and averaging 2.3 × 104 reads/base, were successfully obtained from 23 specimens from 9 individuals, including 7 tumor-oral pairs. Sampling more than 100 viral genomes in at least one specimen per individual showed that KSHV genomes were virtually homogeneous within samples and within individuals at the point mutational level. Heterogeneity, if present, was due to point mutations and genomic rearrangements in tumors. In 2 individuals, the same mutations were found in distinct KS tumors. The K8.1 gene was inactivated in tumors from 3 individuals, and all KSHV genomic aberrations retained the region surrounding the first major internal repeat (IR1). These findings suggest that lytic gene alterations may contribute to KS tumorigenesis or persistence. Author summary Kaposi sarcoma (KS) is a leading cancer in sub-Saharan Africa and in those with HIV co-infection. Infection by Kaposi sarcoma-associated herpesvirus (KSHV) is necessary for KS, yet why only few KSHV infections develop into KS is largely unknown. While strain differences or mutations in other tumor viruses are known to affect the risk and progression of their associated cancers, whether KSHV genetic variation is important to the natural history of KS is unclear. Most studies of KSHV diversity have characterized only ∼4% of its 165-kb genome and may have been impacted by PCR or cloning artifacts. Here, we performed highly sensitive, single-molecule sequencing of whole KSHV genomes in paired KS tumors and oral swabs from 9 individuals with KS. We found that KSHV genomes were virtually identical within individuals, with no evidence of quasispecies formation nor multistrain infection. However, KSHV genome aberrations and inactivating mutations appeared to be a common, tumor-associated phenomenon, with some mutations shared by distinct tumors within an individual. Certain regions of the KSHV genome featured prominently among tumor-associated mutations, suggesting that they are important contributors to the pathogenesis or persistence of KS.


55
Kaposi Sarcoma (KS) is one of the most common cancers of HIV-infected individuals [1,2], with the 56 burden of disease disproportionately borne by people in sub-Saharan Africa [3]. A gamma herpesvirus, 57 Kaposi sarcoma-associated herpesvirus (KSHV), is the etiologic agent of KS and consistently detected in 58 tumor tissues [4,5]. KSHV also can be shed in saliva, thought to be a primary mode of transmission [6][7][8]. 59 Only a small fraction of KSHV infections progress to KS, and the factors contributing to KS pathogenesis 60 are poorly understood. The development of KS is associated with HIV infection and immunosuppression 61 [9], but others factors, including KSHV genome variation, may contribute to differential outcomes of KSHV 62 infection.

63
Studies of other human oncogenic viruses reveal that viral genetic variation or de novo mutations 64 may be important to their pathogenicity, as is the case for cancers associated with human papilloma sickle pe [63] with a quality threshold of 30 and a window size 10% of read length. Filtered reads were 203 aligned to a human genome (GRCh38 p12, GenBank GCA_000001405.27) using bwa mem [64]. Unmapped 204 reads were used as input into the de novo assembler SPAdes v3.11.1 [65], with the setting -k 205 21, 35,55,71,81. This oftentimes yielded 3 to 4 scaffolds that together encompassed the entire 131-kb 206 unique sequence regions of KSHV, bounded by the major repeat regions: Internal Repeat 1 (IR1), Internal 207 Repeat 2 (IR2), LANA central repeat and Terminal Repeats (TR) (Fig 1A). Next, all scaffolds over 500 bp 208 were aligned using bwa mem to the genome of reference KSHV strain GK18. From the aligned scaffolds a 209 draft genome was generated in Geneious (Biomatters, Ltd) with manual correction as needed. To finish 210 the assembly, GapFiller v1.1 [66] was used, setting bwa as the aligner and filtered paired-end reads as the 211 input library. The genomes were annotated in Geneious from the GK18 reference, also adding the T1.4 212 annotation based on [67]. The major repeat regions were masked with Ns since they were poorly resolved 213 by assembly of short reads that can map to multiple locations within the repeat regions.

215
Variant identification from dUMI-consensus reads. Paired-end reads, including their dUMI sequence tags, 216 were mapped by bwa to sample-consensus genomes (Fig 2)  in a sample was determined by ddPCR analysis of KSHV genes vIL-6, vBCL-2, RTA and LANA (Fig 1A) and 273 provided as the percentage "on-target" KSHV DNA. These levels ranged from 0.03% to 1.35% (median 274 0.17%) in tumors, while most oral swab samples were below 0.01% on-target (

289
Very few point mutations were found in dUMI-consensus reads from either tumors (Fig 3A) or oral 290 swabs (Fig 3B). Excluding the major repeat regions, the number of positions with a detectable intrasample 291 variant base ranged from 2 -218 (<0.01 -0.17%) ( Table 1). These frequencies were lower or comparable 292 to those in the BCBL-1 cell line ( Intra-individual single nucleotide differences between tumor and oral swab samples ranged in 308 number from 0 -2 across the entire ~131-kb genomes, not counting the major repeat regions. Notably, 309 there were almost no intra-individual polymorphisms in the KSHV hypervariable gene K1 (Fig 3A-B). Hence, 310 no evidence for minor KSHV variants or multi-strain infections was found in these individuals.

311
KSHV genomes were distinct across the 9 participants, with sequence differences ranging from 3. Tumor 003-C. Read coverage in U003-C was high (average of 15,635) and uniform across the KSHV 331 genome except for a 6-bp gap within the K8.1 gene intron up to the first base of the second K8.1 exon (Fig   332  4A). No read indicated a deletion, nor was any read found with its mate pair located across the 6-bp gap.

333
This region was PCR amplified from unsheared U003-C tumor DNA using conserved primers flanking the 334 gap (Fig 4B), and no PCR product was detectable. In contrast, an intact K8.1 intron sequence was amplified 335 and sequenced from the oral swab of the same participant (Fig 4C).

336
De novo assembly revealed that the reverse complement of TR sequences continued from the 337 deleted K8.1 intron sequences of U003-C (Fig 4B). The K8.1-TR junctions were confirmed by PCR with 338 primers flanking the junctions (Fig 4D) and Sanger sequencing. Inversion of the 60-kb 3' half of the U003-C 339 genome, starting inside K8.1, is a parsimonious explanation for the breakpoints.

340
Tumor U004-D.  Fig 5D). This was confirmed in the unsheared tumor DNA extract by PCR and Sanger sequencing 352 using primers spanning the breakpoint (Fig 5D & E, lanes 4 & 5). Other primer pair combinations were 353 tested to see if there were DNA species with the 14.8-kb segment inverted, deleted in place, duplicated 354 in tandem or rearranged in other ways. None generate detectable PCR products except for primer pairs showing that the 14.8-kb segment also exists in the native configuration (Fig 5E). Thus, the 14.8-kb 356 segment was copied into IR2 but had not been deleted from its original location.

357
In a parallel study of viral transcriptomes [80], abundant expression of a chimeric Kaposin transcript 358 fused to the 14.8-kb segment was found in tumor U008-B, consistent with the viral genome structure we 359 observed. Another tumor from the same participant, U008-D (Fig 5B), had 100% nucleotide identity and 360 was confirmed to have the same duplication and breakpoint junctions (Fig 5F).

361
Tumor U020-B. Read coverage abruptly dropped 12.8-fold over the last ~90 kb of the KSHV genome in 362 this tumor (Fig 6A). This was consistent with ddPCR quantitation, with vIL-6 and vBCL-2 gene amplicons 363 having 9.0 -12.3-fold higher levels than RTA and LANA ( Table 2). The coverage shifted before the end of 364 ORF25 (GK18 position 46,615) and reads at this breakpoint continue into TR sequences ~90 kb 365 downstream (Fig 6C). Thus, U020-B appeared to have KSHV genome variants with a ~90-kb deletion, or 366 formally, a 12.8X amplification of a 46-kb subgenomic region. No U020-B tumor DNA remained to allow 367 confirmation of this breakpoint.

368
Tumor U020-C. This tumor from participant U020 had (30-fold) shift in read coverage and different 369 breakpoints inside ORF11 and ORF18 (Fig 6B). ddPCR quantitation demonstrated gene copy numbers of 370 vIL-6 and vBCL-2 amplicons to be 25 -36-fold higher than for RTA and LANA ( Table 2). The spike in read 371 coverage occurred over a 16.2 kb region (GK18 positions 16,942 to 33,011). Again, chimeric reads were 372 found at either ends of this region continuing into TR sequences (Fig 6D), indicating fusion with the TR 373 (Fig 6E). Junction fragment-specific PCR and Sanger sequencing confirmed the 3' junction (Fig 6G, lane 5).

374
No PCR product was produced from the other putative breakpoint junction, TR-ORF11 (Fig 6F,  available tumors from the same individual. In the case of U008-B and U008-D, full-length genome 391 sequencing showed that they had the same 14.8 kb subgenomic sequence duplicated in IR2 (Fig 5F). These 392 two tumors were biopsied from distinct lesions on the left leg (S6 Fig; S2 Table). Nested PCR screening for 393 this breakpoint junction sequence in 6 other distinct lesions (S2 Table) from this individual showed that 394 no other tumors had this duplication (not shown).

395
In contrast, four additional tumors tested from participant U003 had the same inversion breakpoints 396 as tumor U003-C (Fig 7). Moreover, no intact K8.1 sequences were detected in 2 of these 4 tumors by 397 nested PCR of the region spanning the K8.1 intron gap (Fig 7). These biopsies came from distinct lesions 398 in the left leg (S2 Table). Lastly, in participant U020, the ORF18-TR junction sequences found spanning the 399 U020-C genomic deletion was not detected in the 2 other tumors tested.

436
In contrast, EBV-sequences were detected in only 5 of 12 tumors, and in all cases were sequences flanking 437 the EBNA-1 repeat (S8 Fig). The proportion of reads mapping to EBV in oral swabs ranged from 0 -33%, 438 median 1.8%, whereas in tumors the range was from 0 -0.5%, median 0.002% (S4 Table).   (Tables 3 & S3), with many truncating protein coding sequences. These mutations 468 may have been selected for, if for instance expressing these proteins exposed host cells to immune 469 targeting. Mutations that disrupt genes may also indicate that those genes were not necessary for 470 sustaining tumorigenic growth. Conversely, regions of the KSHV genome that were duplicated, 471 conspicuously intact or translocated next to strong promoters (as in 008-B and 008-D) may point to KSHV 472 genes that are influential in driving tumor cell proliferation.

473
The genomic region around IR1 featured prominently in genomic rearrangements in 4 tumors, 474 potentially leading to their over-expression relative to other KSHV genes. For example, tumors U008-B 475 and U008-D had a 14.8-kb portion of their genomes, from inside K3 to ORF19, duplicated into within IR2 (Fig 5). In a parallel RNAseq study, tumor U008-B had been found to abundantly express a chimeric 477 transcript of the 14.8 kb section fused to IR2 sequences transcribed from a strong latency-associated 478 promoter [80]. Distinct deletions were observed in tumors U020-B and U020-C from another participant, 479 but the genomic regions retained, aside from TR sequences, again included the IR1 region (Fig 6) inversion breakpoint at the K8.1 intron, U004-D had a 28-bp deletion ending at 4 bases upstream of the 502 first K8.1 exon, and U020-C had a nonsense mutation at the start of the second K8.1 exon (Fig 8).