Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates

Kaposi sarcoma (KS), a common HIV-associated malignancy, presents a range of clinicopathological features. Kaposi sarcoma-associated herpesvirus (KSHV) is its etiologic agent, but the contribution of viral genomic variation to KS development is poorly understood. To identify potentially influential viral polymorphisms, we characterized KSHV genetic variation in 67 tumors from 1–4 distinct sites from 29 adults with advanced KS in Kampala, Uganda. Whole KSHV genomes were sequenced from 20 tumors with the highest viral load, whereas only polymorphic genes were screened by PCR and sequenced from 47 other tumors. Nine individuals harbored ≥1 tumors with a median 6-fold over-coverage of a region centering on K5 and K6 genes. K8.1 gene was inactivated in 8 individuals, while 5 had mutations in the miR-K10 microRNA coding sequence. Recurring inter-host polymorphisms were detected in K4.2 and K11.2. The K5-K6 region rearrangement breakpoints and K8.1 mutations were all unique, indicating that they arise frequently de novo. Rearrangement breakpoints were associated with potential G-quadruplex and Z-DNA forming sequences. Exploratory evaluations of viral mutations with clinical and tumor traits were conducted by logistic regression without multiple test corrections. K5-K6 over-coverage and K8.1 inactivation were tentatively correlated (p<0.001 and p = 0.005, respectively) with nodular rather than macular tumors, and with individuals that had lesions in ≤4 anatomic areas (both p≤0.01). Additionally, a trend was noted for miR-K10 point mutations and lower survival rates (HR = 4.11, p = 0.053). Two instances were found of distinct tumors within an individual sharing the same viral mutation, suggesting metastases or transmission of the aberrant viruses within the host. To summarize, KSHV genomes in tumors frequently have over-representation of the K5-K6 region, as well as K8.1 and miR-K10 mutations, and each might be associated with clinical phenotypes. Studying their possible effects may be useful for understanding KS tumorigenesis and disease progression.


DNA preparation, PCR and KSHV gene copy number quantification
Genomic DNA was extracted using the QIAGEN DNeasy Blood & Tissue Kit (50) (Catalog # 69504) following manual single column extraction protocol per guidelines. All specimens were quantified using a Thermo Fisher Qubit Flurometer. Methods for droplet digital PCR (ddPCR) for gene copy number quantification, PCR screening of KSHV genes and for confirmation of breakpoint junctions have been described previously [30]. S1 Table lists the PCR primer sequences used in this study.

Library preparation and sequencing
To obtain approximately 500-bp DNA fragments for sequencing, 10-20 ng/μL of DNA in 100 μL chilled TLE buffer (10mM Tris, pH8.0, 0.1mM EDTA) was sheared using a Covaris S2 Sonicator set to duty cycle 5%, 200 cycles per burst for a total of 30 seconds. Sheared DNA was purified using 1X volume of Agencourt AMPure XP Beads (Beckman Coulter Cat. # A63880) and eluted in 50 μL water. Library preparation (including end repair, A-tailing and adapterligation) was performed using the KAPA HyperPrep Library Preparation Kit (Cat. # KR0961/ KK8503). Subsequently, the DNA was again purified with 1X volume beads and eluted in 50 μL of water. Samples were then divided into aliquots of <240 ng, if needed, prior to the next PCR step.
DNA libraries were subjected to pre-enrichment amplification with KAPA HyperPrep adapter primers (Cat# KR1317). The pre-enrichment PCR conditions were: 95˚C 4 mins; 5 cycles of 98˚C 20 sec, 60˚C 45 sec, 72˚C 45 sec; 72˚C 3 mins, 4˚C hold. Products were then pooled and purified as above with 1.2X volume beads and elution in 100 μL water, quantified with Nanodrop, and sizes assessed using a Bio-analyzer (Agilent DNA 7500).
KSHV sequences were enriched with biotinylated RNA baits [30], with the following modifications: Sample indexes for multiplexed sequencing were from the KAPA Single-Indexed Adapter Kit (Cat. # KR1317), primers for library amplification were mws13 and mws15 (S1 Table), and each DNA-RNA bait hybridization mixture was split into 2 PCR reactions and then amplified for up to 16 cycles. Products were cleaned with 1X volume XP beads. Sequencing was done on Illumina HiSeqX to obtain 150 bp paired-end reads.

De novo assembly and analysis of KSHV genomes
Draft KSHV genomes were assembled de novo using SPAdes v3.11.1 [33] as reported previously [30], but omitting the initial step of trimming read ends. The setting -k 21,35,55,71,81,127 was used in SPAdes. Scaffolds produced from SPAdes were aligned to the KSHV reference genome GK18 (Genbank AF148805) for organization and orientation, inside the alignment viewer of Geneious (R11.1.5 for Mac OS, https://www.geneious.com). GK18 sequences were not used for refinement of draft genomes. Instead, sample read libraries were mapped to their respective draft genomes. The resulting consensus genome sequence was finished with GapFiller v1.1 using the sample paired-end read library [34]. For multiple sequence alignments of genomes, MAFFT (FFT-NS-i x1000, scoring matrix 1PAM/k = 2) implemented in Geneious was used. Most coding sequences (CDS) and non-coding RNA annotations were transferred from the GK18 genome via Geneious. Annotations for K8 and the variable genes K1 and K15 were transferred from the most homologous sequences among KSHV genomes GK18, Japan1 (Genbank LC200589) and ZM095 (Genbank KT271456). T1.4 RNA annotation was based on [35]. All CDS were inspected for indels.
Twenty new whole KSHV genomes sequenced from this study were uploaded to Genbank

Structural variation and integration breakpoint detection
The KSHV reference genome GK18 was appended as an extra chromosome to the human genome reference GRCh38 p12. Paired-end reads from each sample library was mapped to the appended human genome reference. Split reads and discordant read pairs extracted into separate alignment files were generated for each sample using SpeedSeq [36]. All potential rearrangement breakpoint locations, along with the number of supporting split and discordant paired reads, were tabulated from the SpeedSeq output files using LUMPY [37] and imported into Geneious as VCF annotations of genome GK18. Supporting reads of breakpoint positions in which the corresponding breakpoint have been identified in LUMPY were manually inspected in the Geneious alignment viewer for confirmation. In cases where LUMPY did not output a breakpoint location for a region that had an abrupt read coverage change, split reads were identified using the Geneious read-mapping tool.
Potential breakpoints detected in LUMPY were accepted for analysis using the following criteria. 1) Breakpoints had to have at least 6 supporting split reads and at least 1 supporting paired read. For our earlier sequencing dataset that utilized duplex unique molecular identifiers [30], the threshold was lowered to 3 supporting split reads, the lowest that had PCR validation. 2) Breakpoints had to be at least 1 kb apart, approximately twice library insert sizes. 3) Identical breakpoint locations, which were found only in tumors from the same person, were counted as one. 4) Breakpoints of rearrangements wholly inside KSHV repeat regions were filtered out. 5) The coordinates of remaining breakpoints, along with the number of supporting reads and other output information from LUMPY and Geneious, were exported as a table.tsv file for analysis of the DNA sequence context in R (S2 Table).

Analysis of DNA secondary structures and motifs
The following analyses were implemented with custom scripts in R with the Biostrings package [38] (https://github.com/MullinsLab/HHV8-non-B-DNA). Five kb sequences centered at the observed breakpoints (S2 Table) were taken from the KSHV genomes sequenced as part of this study. A permuted control dataset was generated by shuffling the sequences of the same 5 kb fragments (referred to as "control permuted" in figures). A randomized control dataset was generated from 5 kb sequences centering on an equivalent number of random positions along the GK18 genome, excluding TR sequences ("control random"). For each extracted window, the probabilities of cruciform, denaturation and Z-DNA formation at every base position were computed using perl software package SIST (Stress-Induced Structural Transitions) [39], set to the algorithm that considers competition among the three structures. Potential triplex and Gquadruplex formation was analyzed separately using R packages triplex [40] and pqsfinder [41], respectively. Both algorithms output a score at each position, which was normalized to a maximum of 300 [41] and 50, respectively. Non-B DNA probabilities and relative scores of only the central 1 kb were then plotted in R package ggplot for each observed breakpoint (Supplementary Graphs). Means of probabilities and relative scores were taken from equivalent positions across all breakpoint windows to create a plot of averages.
Comparisons of non-B DNA in observed breakpoints and control, through summed values and shortest distances, were done as follows. For each non-B DNA, probabilities or relative scores at positions ±200 bp from the observed breakpoints were summed. In addition, for each non-B DNA, the shortest distance of the observed breakpoint to a position with predicted non-B DNA probability or score >0.20 was taken from either direction, including 0. The summed values and shortest distances were calculated for the set of random breakpoints, and the differences between means of observed and random sets were tested using a Welch's Ttest. To estimate the probability of attaining by random chance the observed mean G-quadruplex summed scores or higher, and the observed mean distances to G-quadruplexes or shorter, G-quadruplex summed scores and shortest distances were calculated from 1,000 simulations produced in R of sampling 31 random positions along the GK18 genome.

Statistical analyses of clinical trait associations
Participant characteristics were summarized by median, interquartile range (IQR), or range as appropriate, or categorized and summarized by proportion. Tumor characteristics and viral genetic polymorphisms observed were classified into binary categories. Analyses were conducted at the tumor level. Logistic regression was used to estimate odds ratios (OR) with 95% confidence intervals (95% CI). Standard Errors (SE) were calculated using a cluster-robust estimator to correct for correlation between tumors from the same participant. Exact logistic regression was used when logistic regression failed due to categories with no participants.
For analysis of participant survival at 1 year from study enrollment (OS), individuals were classified by whether they have at least one tumor with the observed mutation at baseline, and analysis was conducted at the participant level. Hazard ratios (HRs) for mortality with 95% CIs were estimated using Cox regression with robust standard errors.
Results were considered significant with two-tailed P<0.05, with no multiple tests corrections applied.

Participant characteristics
A total of 67 KS tumor biopsies from 29 participants were included ( Table 1). Twelve participants had only one tumor examined in this study, while 17 had more than one tumor

KSHV genomes in KS tumors often have higher representation of a subgenomic region near IR1
Our previous study described 5 unique tumor-associated KSHV genome aberrations, with 3 corresponding a 2 to 30-fold greater read coverage of a sharply delineated region close to IR1, compared to the rest of the viral genome [30]. To better determine the frequency of discordant read representation in the IR1 region in tumors, a total of 67 tumors from 29 individuals, including the 12 from [30], were screened for relative copy number elevation. Four short segments of the viral genome, within the K2, ORF16, ORF50 and ORF73 genes, were quantified by ddPCR. K2 and ORF16 are located near IR1, while ORF50 and ORF73 are located near the midpoint and 3' end of the KSHV genome, respectively (Fig 1A).
Of the 65 tumors with positive PCR results, 13 had ORF50 and ORF73 copy numbers that were a median of 6-fold less than K2 and ORF16, or below the limit of detection, consistent with an elevation in the IR1 region ( Table 2). In contrast, only 2 tumors had K2 and ORF16 that were 2-fold less than ORF50 and ORF73 levels or below the limit of detection. Two tumors [30] had no DNA left for ddPCR (U003-C and U030-C), but their previous whole genome sequencing results did not reveal IR1 region read over-coverage [30]. In the remaining 48 tumors, copy numbers of the 4 genes were within a 1.5-fold range. Notably, when copy number variation was detected, it was not the case for all tumors sampled from an individual.

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates Twenty consensus KSHV genome sequences were obtained from 10 individuals with >2-fold higher copy numbers of K2 and/or ORF16 compared to ORF50 and/or ORF73, including 1-3 additional tumors from the same individuals and 1 tumor each from 6 individuals with no discordant ddPCR results, choosing tumors with the highest viral DNA load. Greater than 99.9% of the KSHV genome aside from the 3 repeat regions was successfully sequenced from 20 of 24 tumors, including those from as low as 769 average genome copies per μL as estimated by ddPCR ( Table 2). Eight of the 20 tumors had elevated read coverage that encompassed a region between genome positions 26,000 and 32,000 (Fig 1B). In 5 of these (U099-D, U108-B, U156-D, U156-E, U210-B) the spike in read coverage was sharply confined to this region. Read over-coverage was also found in 2 tumors (U108-B and U156-D) in which copy numbers of the K2 and ORF16 regions were not elevated in the ddPCR screen ( Table 2). This was due to the amplicons used in ddPCR being just outside the regions of over-coverage. Taking the ddPCR copy number variation and whole genome sequencing results together, a total 16 of 65 tumors, from 10 of 29 individuals, were determined to have IR1 region over-representation.
The precise boundaries of the overrepresented regions, as well as any genomic aberrations not apparent from read coverage changes, were determined by mapping supporting split reads (see Methods). Two tumors from participant U048 (U048-D and U048-E) were found to have the same breakpoint. Two of 4 tumors from U156 (U156-D and U156-E) also had identical breakpoints, while 2 of this participant's other tumors had relatively even read coverage across the genome (Fig 1B). No breakpoint was found joining KSHV and human sequences together, which would have indicated integration of KSHV genomes into human chromosomes.
Putative breakpoint junctions in the original tumor DNA from U048-D were validated by PCR across breakpoints followed by sequence confirmation. Breakpoint junctions were found that joined IR1 to TR sequences, K6 to TR sequences (Fig 2) and connecting IR1 to K6 sequences in the opposite orientation (S1A-S1C Fig). The breakpoint junction sequences suggest multiple rearrangements of the K5-K6 region with at least one inversion, but the structure was not further characterized due to the length of the repeats and high GC content in IR1.

A 2.2-kb region encompassing K5 and K6 corresponds to the minimal region of overrepresentation
In the 32 tumor-derived KSHV genomes sequenced between this and our previous study [30], nine unique KSHV genome aberrations had an abrupt, �1.5-fold read coverage over-representation in a subgenomic region near IR1. Strikingly, all encompassed a 2.2kb segment downstream of IR1 that included the K5 and K6 genes (Fig 3). The recombinant KSHV strain BAC36 had been reported to have read overrepresentation in the same region [42] (Fig 3). The T1.4 long non-coding RNA was found in 6, and the PAN long non-coding RNA was included in 8 of 9 cases.
The average length of the genome segments with read coverage overrepresentation was 17.6 kb (Fig 3), or~13% of the 137 kb KSHV genome excluding the terminal repeats (TR). Given ten random 17.6-kb windows of the KSHV genome sequence, that all 10 would include the 2.2-kb K5-K6 region is highly improbable (2.3 x 10 −38 ). No other non-repeat region >3 kb in size had >1.5-fold read over-coverage in any of the 32 tumors in our cohort with whole KSHV genomes sequences determined. Through either ddPCR screening or whole viral genome sequencing, the K5-K6 region overrepresentation was found in at least one tumor from 10 individuals ( Table 2), approximately one-third of the cohort of 29 individuals. Of note, participant U020 had two tumors with different breakpoints [30] yet both contained the minimal overrepresented region (Fig 3).

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates

KSHV rearrangement breakpoints are associated with potential Gquadruplexes
Chromosomal rearrangement breakpoints are prone to cluster at fragile regions, typically sequences that form non-B DNA structures [43]. Genomes of gamma-herpesviruses are enriched, more than expected from nucleotide composition, with GC-rich DNA motifs known to induce double-stranded breaks that undergo repair and recombination in B-cells [44,45]. We therefore assessed whether KSHV rearrangement breakpoints were associated with these DNA features.
To identify associations between KSHV rearrangement breakpoints and potential non-B DNA sequences, sequences ±500 bp from all 31 breakpoints identified here and in other studies [30,42,46] (S2 Table) were scanned for sequence motifs associated with the following 5 non-B DNA structures: cruciforms, Z-form DNA, AT-rich local melt regions, mirror repeats or triplexes, and G-quadruplexes (G4) (S3 Fig). This window length was chosen because breakpoints have previously been associated with G4 DNA up to 500 bp distant [47]. Control windows were generated by permuting the sequences in the observed breakpoint windows.
Breakpoints were most common at or near predicted Z-DNA and G4 sequences (Fig 4A  and 4B). The 2 breakpoints 3 bp apart in ORF19 were within a high probability Z-DNA sequence. On average, Z-DNA probabilities had a local peak at the observed breakpoint although a nearby peak was also generated from the random control dataset (Fig 4C). G4 scores also peaked at the breakpoint position but not in the control data (Fig 4D). Cruciform DNA was not predicted to occur in any of the breakpoint or control windows.
To quantify their association with breakpoints, non-B DNA probabilities or relative scores within 200 bp in both directions from the 31 observed breakpoints were summed, and the mean summation in this set was compared to that of 31 random breakpoints (S2 Table) using a Welch 2-sample T-test ( Table 3). Among the 5 non-B DNA motifs analyzed, G4 had the largest difference between the means of the observed and random datasets (p = 0.004). As another measure, distances in either direction from each breakpoint to the nearest position with >0.20 non-B DNA probability or relative score were counted, and the mean distances were compared with that of control breakpoints ( Table 3). The closest distances to local melt regions and regions with relative G4 scores >0.20 were significantly smaller for the observed breakpoints than for the random dataset (p = 0.024, p = 0.0007, respectively).
To get a more robust assessment of association with predicted G4 structures, 1000 simulations of 31 randomly sampled points along the GK18 genome were made to generate a normal distribution of means. The probability of attaining equal or higher means than the observed mean G4 summed scores was 1.4 x 10 −6 , and the probability of attaining equal or lower than the observed mean distances to G4 was 0.0009.

Potentially inactivating mutations are common in the K8.1 gene in tumors
We previously observed inactivating mutations (truncation, inversion, transcription start site deletion) in the K8.1 gene in KS tumors [30]. In the current study, 5 of 20 whole KSHV genomes sequenced had inactivating mutations in K8.1 (tumors U108-B, U156-D, U156-E, U210-B and U216-D). In contrast, none of the 85 other KSHV coding segments had intra-host differences in more than one person, even though three-quarters of the KSHV coding sequences are larger than K8.

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates A total of 9 unique K8.1 mutations were detected in 8 individuals (Table 2, Fig 5A), including five tumors with nonsense mutations in the second exon, and three with 28-32 bp deletions between the K8.1 core promoter and the K8.1 coding sequence, including the transcription start site (Fig 5A). These deletions were all unique, and one included the first base of the K8.1 coding sequence (U020-E in Fig 5B). Finally, participant U003 had a genomic inversion interrupting the second K8.1 exon and extending to TR sequences [30]. U020 had 2 tumors with different K8.1 mutations-a nonsense mutation in U020-C and a transcription start site deletion in U020-E (Fig 5A).

Polymorphisms in miR-K10, K4.2 and K11.2
Promoter region deletions, indels and nonsense mutations in the 85 KSHV genes other than K8.1 were compiled from all 20 tumor-derived KSHV whole genomes sequenced here and the 12 from our previous study [30]. No other promoter region deletions or nonsense mutations were found. Seven different indels were identified, 4 of which were in-frame ( Table 4). Only

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates one, an A insertion in an A homopolymer run in K11.2, could be identified as an intra-host mutation, as it was found in only 1 of 4 tumors from U156 ( Table 4).
Intra-host differences between matching sample-consensus genomes were determined for 10 individuals who had whole KSHV genomes sequenced from more than one sample, including from the oral swabs sequenced previously [30]. Six nonsynonymous point mutations were Table 3

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates found in 6 different genes ( Table 4). Three instances of synonymous point mutations were detected, including 2 that were the same K12 G126A nucleotide substitution found in 2 different individuals. This mutation was found in 1 of 4 tumors from U156, and in all 4 tumors ( Table 4) but none of the 3 oral swabs (S3 Table) from participant U003. This mutation impacted the overlapping microRNA K10 sequence and is designated miR-K10 G16A (position C118082T in the GK18 genome). To determine the extent of miR-K10 diversity in this cohort, PCR and sequencing was conducted on all 65 tumors ( Table 2) and 18 oral swabs (S3 Table). Five of 29 individuals had either a miR-K10 C15T or G16A mutation, all in tumors, while the remaining sequences had the database consensus nucleotide in those positions. Truncations of the K4.2 coding sequence were reported in 12 of 16 adults with KS in Zambia [29]. In our studies, truncations of K4.2 were found in 16 of 22 individuals, resulting from frameshifts and premature stop codons (Fig 6A). However, no intra-host differences were detected. A 174-bp duplication in the central domain of K11.2 was present in 8 of the 22 individuals we studied (Fig 6B) and in 12 of the 16 genomes from Zambia [29] (a partially overlapping set with those that had a truncated K4.2). To determine the overall extent of K4.2 and K11.2 diversity in our cohort, both genes were sequenced following gene-specific PCR in all 65 tumors ( Table 2). K4.2 was truncated in 23 of the 29 individuals, while the central 174-bp duplication in K11.2 was found in 9 of 29. No intra-host differences were found.
Among all 96 KSHV genomes sequenced to date, 3 predominant length polymorphisms of K4.2 were found: 31 were full-length at 549 or 546 bp, 47 had a truncation to 378 or 369 bp, and 18 had a truncation to 237 bp or shorter (S4 Table). Of the 94 KSHV genomes with a  Table), including 31 with 1 or 2 nucleotide differences in the duplicated sequence. A phylogenetic network was created from all sequenced KSHV genomes to date (Fig 7). Notably, the K4.2 and K11.2 polymorphisms did not coincide fully with major KSHV phylogenetic groupings [20] (Fig 7; S4 Table). Since data assessing K5-K6 over-coverage and K8.1 inactivation is not available from most published sequences, we are unable to assess linkage of these characteristics with KSHV phylogenetic groupings.

KSHV mutations associated with disease course and tumor characteristics
An exploratory statistical analysis was conducted for associations between the observed mutations or genetic polymorphisms and clinical traits (S5 Table) or tumor characteristics (S6 Table). The mutations observed here were evaluated as genetic markers: whether K5-K6 region had read over-coverage over 1.5X, whether K8.1 gene was inactivated, whether

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates miR-K10 had point mutations relative to database consensus, whether K4.2 was truncated, whether K11.2 174 bp central domain was duplicated (Fig 8, S7 Table), whether any of these mutations significantly occurred together (S8 Table), and whether these mutations were associated with any clinical or individual tumor traits (S7 and S9 Tables).

Discussion
This study is the most extensive screening and whole genome sequencing to date of KSHV in tumors. From a total of 65 KS tumors from 29 individuals, the K8.1 mutations, the over-

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates representation of the K5-K6 gene region, and to a lesser extent, the mutations in the miR-10 sequence, occurred at highly significant frequencies than would be expected by chance. Viral genomic structures within tumor biopsies were heterogeneous, although in multiple instances, genomes with aberrant structures were more abundant than apparently intact viral genomes. Additionally, these variations were associated with certain clinical manifestations of KS tumors.

Recurring KSHV Genome Aberrations in KS Tumors
KSHV sequences derived from tumors and tumor cells often have large genomic aberrations. The first whole KSHV genome sequenced had a 33-kb region duplicated in the TR region [48], and KSHV genomes with major deletions have also been found by PCR screening in some KS tumors and cell lines [49]. In a cohort of 16 individuals with KS in Zambia, 4 harbored KSHV genomes with uneven read coverages indicative of genomic aberrations [29]. Our previous study characterized large inversions, deletions and duplications present in KS tumors from 4 of 8 individuals [30].
Rearranged KSHV genomes involving read coverage over-representation of regions 3.3 to 53 kb in length was present in 10 of 29 individuals, in all cases including the K5 and K6 genes. Read over-coverage ranged from 1.5 to 6,000-fold with a median of 6-fold. The real frequency of K5-K6 region overrepresentation could be higher, since only one or a minority of tumors from each participant were available for study, and the ddPCR screen quantified genomic segments outside the minimal high coverage region. Indeed, whole genome sequencing identified over-coverage of the K5-K6 region in 2 tumors that was not identified in the ddPCR screen. It would be useful to directly quantify via a targeted amplification approach such as ddPCR to

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates compare K5/K6 gene copy numbers to the rest of the KSHV genome when looking for this phenomenon in the future.
The high-coverage regions found in 9 whole KSHV genomes sequenced encompass at minimum a 2.2-kb sequence containing the K5 and K6 genes. However, they are not among the known KSHV oncogenes [50] and thus a direct role in tumorigenesis is not likely. K5 is a viral ubiquitin E3 ligase best known to ubiquitinate and degrade MHC1 from the cell surface [51]. It is a membrane protein expressed upon primary infection and reactivation [52,53] and at low levels during latency [54]. Another KSHV ubiquitin E3 ligase, K3, is far more efficient in degrading MHC1 [51], but K5 modulates more immunoreceptors than K3, targeting also adherins, ligands and co-stimulators for cytotoxic T and NK cells, cytokine receptors, restriction factors and members of the SNARE, ephrin, plexin and other receptor families [51,55,56]. One downregulated ephrin receptor of note is EphA2 [55,56], one of the receptors employed by KSHV for entry into endothelial cells [57]. Amplification of K5 might suggest that cells may become less visible to cytotoxic T and NK cells and refractory to KSHV superinfection as EphA2 expression on host cell surface is depleted by K5. Lytic gene K6 encodes vMIP-I, a viral homolog of cellular cytokine MIP1α (CCL3). vMIP-I is a selective, high affinity agonist for chemokine receptor CCR8 and has angiogenic and chemotactic effects in vitro [58][59][60]. The inclusion of K6 in the minimum region is puzzling, given that vMIP-I is not readily detectable in KS lesions by immunohistochemistry [61].
Other rearranged KSHV genomes may have been present but could not be confirmed since low numbers of split reads connecting non-adjacent regions of the KSHV genome were indistinguishable from PCR chimera artifacts. Additionally, while another gamma-herpesvirus, EBV, can be occasionally found to be integrated into its host's chromosomes [62], no evidence of integration of KSHV sequences to human ones was found.

Possible role of defective viral genomes
It is striking that genome fragments containing K5 and K6 were amplified or retained in nearly all genome rearrangements that were detected, becoming the majority or only detected genome form in some tumors. Nearly all K5-K6 subgenomic fragments connect to IR1, a KSHV origin of lytic replication [63,64], and to TR sequences, which have cleavage signals involved in genome packaging [65]. These 2 DNA elements together could be sufficient for amplification in situ and for transmission to other cells if complemented by a 'helper' KSHV. Defective or deleted viral genomes in primary tumors have been reported previously for gamma-herpesviruses EBV [66,67] and KSHV [49], although their role in a natural infection is unclear. In KSHV, cells harboring defective KSHV genomes that had 82 kb deleted from the 5'end (including K5 and K6) had more malignant phenotypes than their parental BCBL-1 cell line but were lytic replication incompetent [49]. Methods such as RFLP or Southern Blot could be used to determine whether the elevated K5-K6 genome fragments detected here exist as defective viral genomes.
Two persons had identical KSHV genome rearrangements found in multiple tumors. Taken together with our earlier finding of such cases in 2 more individuals [30], this constitutes strong evidence that tumor-associated, rearranged KSHV genomes can spread by metastases, helper virus activity or residual infectivity of the rearranged genome. Identical KSHV genome rearrangements in distinct tumors implies that the viruses were clonally related, because independently acquiring the same breakpoint sites is highly improbable.

Association of breakpoints with GC-rich DNA features
Certain patterns or compositions of DNA sequences that are conducive to non-B DNA structure formation can be hotspots of genomic fragility and rearrangement [43]. Here, the

PLOS PATHOGENS
Genomic changes in Kaposi Sarcoma-associated Herpesvirus and their clinical correlates positions of KSHV genome rearrangement breakpoints were associated with predicted non-B-DNA structures. There was clustering of KSHV rearrangement breakpoints in K4.1-K4.2, IR1, the minor repeat region, and in ORF18-ORF19. In the last cluster, 4 unique breakpoints were within 1.2 kb, with 2 breakpoints only 3 bp apart. IR1 and the minor repeat region are composed of tandem repeats with poly-G or C runs. Such repeats are prone to forming G-quadruplexes (G4), stable structures formed by 4 consecutive guanines folding into a tetrad and stacking with other guanine tetrads on the same strand [47,68,69]. ORF19 has a high-probability Z-DNA-forming sequence, precisely at the 2 breakpoints that were 3 bases apart. Z-DNA refers to a left-handed helix in a zigzag pattern formed by alternating purines and pyrimidines [69]. Z-DNA and G4 can form during replication, transcription and other events that generate negative supercoiling and expose single-stranded DNA [69]. When left unresolved, they can interfere with the progression of replication forks, leading to double-stranded DNA breaks [69].
G4 were the most common among the five non-B DNA structures predicted within 500 bp of the 31 KSHV rearrangement breakpoints identified here and in other studies [30,42,49]. G4 had significantly higher summed scores and shorter distances to the observed breakpoints, compared to 1,000 simulations of 31 random points on the GK18 genome. This result is consistent with G4 being associated with recombination breakpoints in other herpesviruses [45,70]. GC-rich DNA motifs that facilitate double-stranded breaks for recombination-dependent repair are enriched in gamma-herpesvirus genomes beyond what is expected from their nucleotide composition, and compared to alpha-and beta-herpesviruses [44].

Novel Intra-host and Inter-host Variations
K8.1 is an envelope glycoprotein necessary for infecting primary B-cells but not primary endothelial cells [71,72]. Here, nine different K8.1 inactivating mutations were found in 8 of 29 individuals-including nonsense mutations, promoter region deletions and rearrangement breakpoints. The nonsense mutations observed here and in other studies truncate the K8.1 coding sequence, removing its C-terminal transmembrane anchor domain [73].
K8.1 was the only KSHV gene with inactivating mutations found in more than one individual. To date, inactivating mutations in K8.1 have been observed only in KSHV genomes derived from KS tumor biopsies (GK18, ZM124 and Miyako1), and not in matching oral swab samples [20,30]. K8.1 mutations were also often tumor site-specific since one person had 2 different K8.1 mutations and other tumors from the same individuals often had intact K8.1. Hence, selection against K8.1 expression may be local.
K8.1 is highly immunogenic [73][74][75][76]. However, unlike the lytic membrane protein K1 that has hypervariable extra-cellular domains, full-length K8.1 sequences are highly conserved across individuals. Additionally, the C-terminal region of K8.1 that is often truncated are not among known antigenic hotspots [77,78]. Given that the K8.1 mutations observed were inactivating, tumor-associated, and unique to every individual, they must arise de novo. Characterization of the immune-reactivity of individuals harboring KSHV with an inactivated K8.1 gene will be needed to substantiate the hypothesis of immune driven selection.
KSHV microRNA-K10 is weakly transforming [79], and a single nucleotide change is known to alter its tumorigenicity [80]. Its 23-nt UAGUGUUGUCCCCCCGAGUGGCC sequence is tightly conserved in the vast majority of KSHV genomes evaluated to date [81]. The C15T and G16A mutations observed in this study (both bolded above) were found outside the miR-K10 seed sequence (underlined). These mutations were also tumor-specific and are predicted to result in a slightly more stable stem loop [30]. The miR-K10 mutations were also almost always the only intra-host synonymous mutation detected in the~131 kb of the KSHV genomes sequenced. The full-length K4.2 gene expresses an immediate-early protein that inhibits endoplasmic reticulum chaperone protein pERP1, enhancing KSHV glycoprotein maturation among other effects [82]. All K4.2 truncations delete its putative transmembrane domain, which abrogates localization of K4.2 to the ER [82]. It has been reported that completely deleting K4.2 diminishes the expression of KSHV glycoproteins K8.1 and gB, dampening infectious virus titer by 4-fold [82]. This effect on K8.1 production is noteworthy in the context of our findings, because K4.2 truncation and K8.1 inactivation were inversely correlated, i.e., they tended to not co-occur in this cohort (p = 0.013). Suggestively, truncated K4.2 may have the same effect as a K8.1 inactivation in depressing K8.1 production. This can be tested by immunohistochemistry for surface expression of intact K8.1 in KS tumors with and without the K4.2 truncations. Furthermore, according to this hypothesis, K8.1 levels would be diminished or absent in individuals with truncated K4.2.

PLOS PATHOGENS
K11.2 encodes a viral homolog of cellular interferon regulatory factor 2 (vIRF-2), which modulates the antiviral signaling of interferon [83,84] and regulates KSHV lytic replication by suppressing KSHV early lytic gene expression [85]. While the C-and N-terminal domains of vIRF-2 has been characterized [85,86], the function of the central 174-bp repeat has not been elucidated. The repeated sequence may contain internal translation initiation sites that produce the shorter vIRF-2 isoforms that have been observed [85].

Clinical phenotypes of mutations
K5-K6 region overrepresentation and K8.1 inactivation were more likely in nodular rather than macular tumors (p<0.001 and p = 0.010 respectively), while it was the opposite for K4.2 truncations (p = 0.028). The latter argues against the hypothesis above, that the effect of K4.2 truncation and K8.1 inactivation may have the same biological effect. Persons with more widespread lesions (>4 of 8 anatomic areas) were less likely to have K5-K6 region overrepresentation (p = 0.01) and K8.1 inactivation (p = 0.024). miR-K10 mutations were rarer in tumors <1 cm in diameter (p = 0.010) and with a trend to lower survival rate (p = 0.053). However, since most study participants were HIV-positive, the strong effect that active HIV replication has on KS development would likely override any weak association that could exist between KSHV genome alterations and disease characteristics. A similar study on adults with HIV-negative endemic and classic KS may strengthen this association.
Study participants entered the study at varying stages of their illness, so there is heterogeneity in the tumor stages sampled at baseline as well as in the clinical course following admission. While we conducted a large sampling, we examined only a minority of tumors found in many individuals, implying that we are providing only a minimum estimate of tumor-associated intra-host mutation frequencies. Nonetheless, even the most common tumor-associated viral mutations observed, K5-K6 region overrepresentation and K8.1 inactivations were still only found in a minority of KS tumors sampled. As such it follows that the tumor associated KSHV mutations were not required for tumorigenesis, and instead may have been a common outcome of KSHV replication in tumor cells. However, their frequency and clinical associations suggest that these mutations could be influential during KS tumor development. Our findings raise some hypotheses on the potential of these genes to contribute to KS pathology in vivo, a study of which can inform clinical treatments and management of the disease.
Supporting information S1 Fig. PCR and sequencing confirmation of rearrangement connecting IR1 and K6 sequences in U048-D. A. Primers (blue arrows) used to PCR across breakpoint junctions in the tumor DNA extract is shown below a KSHV genome section (GK18 numbering) from K4.2 through IR1 and K6. ORFs are in yellow, repetitive sequences in orange. B. PCR products from indicated primers separated on a 0.8% agarose gel. Visible bands were extracted and sequenced. DNA from the KSHV-infected BCBL1 cell line was used as control. C. Results from sequencing the numbered bands in B. Sequencing of bands 2a and 2c using IR1 primers from both ends unexpectedly show reads mapping to K6 sequences. Sequencing for band 2b failed repeatedly, perhaps due to GC content. Bands 3 and 4b, which are not present in BCBL1, show that K6 sequences are connected directly to IR1 sequences at~1 kb or less, suggesting an insertion. Band 4a shows the normal expected size between IRcomplex-F and K6-R, as seen in BCBL1.  (ZIP) S1 Graphs. 30 graphs of probabilities and scores of predicted non-B-DNA structures 500 bp before and after individual breakpoints examined in the manuscript. The coordinate numbers in the figures of some breakpoints differ slightly from those in Column D of S2