A Two-tiered functional screen identifies herpesviral transcriptional modifiers and their essential domains

While traditional methods for studying large DNA viruses allow the creation of individual mutants, CRISPR/Cas9 can be used to rapidly create thousands of mutant dsDNA viruses in parallel, enabling the pooled screening of entire viral genomes. Here, we applied this approach to Kaposi’s sarcoma-associated herpesvirus (KSHV) by designing a sgRNA library containing all possible ~22,000 guides targeting the 154 kilobase viral genome, corresponding to one cut site approximately every 8 base pairs. We used the library to profile viral sequences involved in transcriptional activation of late genes, whose regulation involves several well characterized features including dependence on viral DNA replication and a known set of viral transcriptional activators. Upon phenotyping all possible Cas9-targeted viruses for transcription of KSHV late genes we recovered these established regulators and identified a new required factor (ORF46), highlighting the utility of the screening pipeline. By performing targeted deep sequencing of the viral genome to distinguish between knock-out and in-frame alleles created by Cas9, we identify the DNA binding but not catalytic domain of ORF46 to be required for viral DNA replication and thus late gene expression. Our pooled Cas9 tiling screen followed by targeted deep viral sequencing represents a two-tiered screening paradigm that may be widely applicable to dsDNA viruses.


Introduction
Human oncogenic viruses are a major cause of cancer, with recent estimates that 15% of all cancers are associated with a viral infection [1]. Gammaherpesviruses are a class of doublestranded DNA viruses that include both the oncogenic Epstein-Barr virus (EBV)-a ubiquitous infection associated with a number of malignancies [2]-and Kaposi's sarcoma-associated herpesvirus (KSHV), a major cause of cancer in AIDS and other immunocompromised patients [3,4]. While many of the viral factors involved in replication and transcription of the KSHV genome have been identified, numerous non-coding regions, transcripts, and ORFs have no ascribable function [5][6][7][8].
During the lytic stage of KSHV replication, the virus expresses its genes in kinetically regulated waves, culminating in the formation and release of infectious virions. These include immediate early genes that regulate reactivation of the virus, early genes that include components required for viral DNA replication, and finally late genes, which encode for factors such as capsid proteins and glycoproteins involved in virion formation. Immediate early and early genes are driven by host-like promoters, while late gene transcription in KSHV and related gamma-as well as betaherpesviruses has several unique features including truncated promoters and modified TATA boxes [9][10][11][12][13][14][15]. They also require at least six virally encoded transcriptional activators (vTA) [16,17] and, perhaps most notably, their transcription is dependent on viral DNA replication [16,18,19].
The genomes of herpesviruses are double-stranded DNA that replicate in the nucleus and are thus targetable by CRISPR/Cas9. Indeed, CRISPR-based targeting of herpesviruses has been explored as a way to suppress infection [20][21][22][23][24][25]. CRISPR has also been used to make functional insertions and knockouts on the viral genome [26][27][28] or to perform pooled functional screens [29,30] in much the same way as on the mammalian genome. The use of CRISPR to identify essential domains of proteins [31,32] has yet to be applied to dsDNA viruses, but has particular potential given that viral proteins are frequently multifunctional.
Here, we applied a CRISPR approach to probe for functional regulators of KSHV late gene expression by creating a library of sgRNAs that tiles the KSHV genome, corresponding to one sgRNA for every 8 base pairs. Using this tiling library, we systematically screened for regulators of viral late gene transcription. In addition to capturing the majority of known components of this system, we identify novel viral regulators of late gene expression along with higher resolution data highlighting essential domains of these proteins. In particular, we validate that KSHV ORF46 is required for viral DNA replication and thus also for late gene transcription. Targeted deep sequencing of the ORF46 locus provided additional mechanistic insight by identifying the mutations created by Cas9 and implicating an essential role for its DNA binding but not its catalytic domain, which we confirmed through reverse genetics experiments. Collectively, this pipeline provides a framework to create and phenotype thousands of viral mutants to gain high-resolution functional information on KSHV and other dsDNA viral genomes.

BAC16 K8.1pr-mIFP2 infected iSLK cells enables fluorescent readout of late gene activity
We first developed a reporter line that could be used to measure the expression from a late gene promoter. We infected the human renal carcinoma cell line iSLK [33,34] with a modified version of the BAC16 KSHV genome containing a far-red fluorescent protein (mIFP2) driven by the promoter of the viral late gene K8.1 as well as a constitutive EGFP (S1A Fig). The iSLK cell line contains a doxycycline-induced copy of ORF50 -the major lytic transactivator-allowing us to reactivate KSHV upon treatment with doxycycline and sodium butyrate. As expected, expression of the IFP late gene reporter peaked at late times post reactivation and was sensitive to inhibitors of viral DNA replication (Fig 1A, 1B, and 1C). We did note some reactivationdependent autofluorescence in this channel from wildtype BAC16 (Fig 1A, 1B, and 1C), which may limit the sensitivity of the assays.
Separately, we confirmed our ability to create functional mutants in KSHV using CRISPR. We lentivirally delivered Cas9 along with sgRNAs targeting various regions of the KSHV genome into a BAC16-infected iSLK line and used a supernatant transfer assay to evaluate their impact on virion production (S1B Fig). Although the guides targeting the essential viral loci elicited the strongest reduction in infectious virion production, sgRNAs targeting some nonessential regions of the viral genome did appear to have a functional effect above background, perhaps through DNA damage on the viral genome (Figs 1D and S1C).

Pooled tiling screen of KSHV genome for modifiers of late gene expression
Cas9 was stably introduced into to the iSLK line infected with the late gene reporter virus to screen an sgRNA library containing all possible~22,000 sgRNAs targeting the viral genome (S1 Data). After lentiviral delivery of the library, cells were chemically induced to enter the lytic cycle and sorted by FACS based on their reporter expression level (S1D Fig). Composition of high and low late gene expression fractions were quantified by next generation sequencing of the sgRNA locus (Fig 2A and S2 Data), where sgRNAs targeting viral ORFs essential for the expression of late genes were enriched in the late gene low fraction. Enrichment of each guide was reproducible between two replicates, representing a phenotype everỹ 8 bp across the KSHV genome (S2A and S2B Fig).
By comparing the distribution of sgRNA phenotypes targeting each KSHV ORF to the background of negative control sgRNAs, we reproducibly identified many ORFs required for late gene expression (Figs 2B, 2C, and S2C and S3 Data and S4 Data). These include all six known components of the viral DNA replication machinery (ORF6, ORF9, ORF40/41, ORF44, ORF56, and ORF59) along with 4 of the 6 vTA components (ORF18, ORF24, ORF34, and ORF66). In addition, we identified ORF57, which is known to be required for the efficient expression of ORF6 and other DNA replication components [35]. A viral tegument protein, ORF45 [36], showed mixed results in validation and was not pursued for further characterization (Fig 2D and 2E). Guides targeting ORF50, the viral reactivation factor, were unexpectedly enriched in late gene expression, but this phenotype was not reproduced in validation experiments, which showed a strong defect in reactivation as measured by expression of the Halo-Tag-fused ORF68 early gene and a moderate defect in late gene expression (S2D and S2E Fig). ORF46, a DNA repair enzyme [37] not previously associated with late gene transcription, also showed severe defects in late gene expression which we subsequently validated with individual guides (Fig 2D and 2E). Together, this demonstrates the identification of both known and novel viral modifiers of late gene expression. This includes both direct activators of late gene transcription (e.g., the vTAs) as well as potentially indirect regulators (e.g., DNA replication machinery).

Deep viral sequencing identifies ORF46 requirement for viral DNA replication
To further investigate the role of ORF46 in late gene expression, we deep sequenced several viral loci to identify and phenotype mutations created by Cas9 nuclease activity (Fig 3A). This approach enables us to distinguish between Cas9-created mutations that result in a knockout hours post-reactivation. iSLK cells infected with either the wildtype BAC16 or the late-gene reporter modified BAC16 were reactivated, and reporter activity was monitored every day for 96 hours. b) Sensitivity of late gene reporter to inhibitors of viral DNA replication phosphonoacetic acid (PAA) and cidofovir (CDV). Reporter activity was monitored by flow cytometry 72 hours after reactivation. Inhibitors of viral DNA replication restricted reporter activity. c) Virion production of late gene reporter virus after transfer of supernatant to uninfected HEK293T cells. 72 hours postreactivation, viral supernatant was filtered and transferred to naïve HEK293T cells; infection of HEK293T cells was monitored by expression of the BAC16-encoded, constitutive EGFP. Error bars are standard error from three technical replicates. d) Virion production after supernatant transfer to uninfected HEK293T cells. SAFE denotes sgRNAs targeting the host in regions without expected function [47]. Non-coding viral 1 and 2 indicates sgRNAs targeting two viral regions free of ORFs. Each bar represents an independent sgRNA, and error bars represent standard error from three independent reactivations.  Design of viral tiling screen for late gene expression. iSLK cells were infected with KSHV encoding a far-red reporter of late-gene activity. Cells were lentivirally infected with Cas9-blast and subsequently the KSHV tiling sgRNA library. After 1-2 weeks of latency to allow for editing, cells were reactivated into the lytic cycle by doxycycline and sodium butyrate. 48 hours post reactivation, cells were trypsinized, fixed, and sorted for high and low late-gene expression. The sgRNA locus in each population was then sequenced to calculate enrichment. b,c) To calculate significance, sgRNAs targeting each annotated region of the genome were grouped and a signed log Mann-Whitney p-value was calculated comparing each viral region to the negative control sgRNAs using an average enrichment from two replicates. Viral ORFs sorted by significance (b) or genome location (c). A positive log p-value indicates this region promotes late gene expression when disrupted, and a negative log p-value indicates this region is required for expression of late genes. d,e) Three independent sgRNAs from the screen targeting each indicated ORF were individually cloned and delivered to late-gene reporter iSLK cells. Virion production was measured by supernatant transfer to uninfected HEK293T cells (d) and late-gene reporter activity (e). Error bars are standard error from three independent reactivations. Activated, unreactivated, and parent samples treated with the viral DNA replication inhibitor cidofovir (CDV) were included

PLOS PATHOGENS
KSHV CRISPR tiling screen for late gene activators versus an amino acid change and to screen for functional defects associated with a given mutation using phenotypes such as viral DNA replication and virion production (S3A Fig). We amplified and sequenced ORF46 -along with one negative control locus, ORF21, and two components of the vTA complex, ORF18 and ORF34 -from cells in three different viral states. These included (1) latently infected cells (where mutations in these genes should have little effect), (2) cells 48 hours post reactivation, at which time the viral genome is being replicated, and (3) from supernatant 72 hours post reactivation when virions are released into the media (Fig 3A and S5 Data and S1 Table).
By comparing the frequencies of out-of-frame mutations between these conditions, we can directly measure the effect of each gene knockout on both viral production and viral DNA replication. We observed a depletion in ORF46 knockout alleles in cells undergoing viral DNA replication as well as a depletion of knockout alleles from viral DNA in the supernatant (Figs 3B and S3B). In contrast, knockout alleles of vTA components ORF34 and ORF18 were only depleted in supernatant samples, consistent with their specific requirement after viral DNA replication (Figs 3B and S3B). Knockout mutations in a nonessential control, ORF21, are mildly depleted in both samples (Figs 3B and S3B); this is consistent with the previously observed effect of Cas9 activity on virion production (Fig 1D). The relative depletion of ORF46 knockouts from replicating viral DNA suggests that ORF46 acts prior to late gene expression and is required for viral DNA replication.
To independently test the requirement of ORF46 for viral DNA replication, we delivered the panel of sgRNAs from Fig 2D into an independent KSHV-infected Cas9+ iSLK line; the sgRNA panel included targets for both ORF57 and ORF9, both of which are required for viral DNA replication, as well as for ORF46 and ORF45. None of the genes targeted, including ORF46, was required for early gene expression, as measured by expression of the HaloTagfused ORF68 early gene (Fig 3C). Guides targeting ORF46 interfered with incorporation of EdU, suggesting a viral DNA replication defect (Fig 3D), which was confirmed for the ORF46 sgRNAs by qPCR of viral DNA (Fig 3E). Guides targeting ORF45 showed inconclusive effects. Together, these data support a specific requirement of ORF46 for viral DNA replication, which is bolstered by data with the ORF46 homolog in EBV [38]. Notably, ORF46 disruption phenocopied disruption of the viral DNA polymerase ORF9, demonstrating the key role ORF46 plays in viral DNA replication.

ORF46 residues involved in DNA binding but not catalytic activity are required for viral DNA replication
Further analysis of the viral deep-sequencing data allowed us to identify in-frame mutations that do not cause knockouts but instead result in deletions or insertions at the amino acid level in each locus observed. By grouping these mutations by where they target along the gene body, we further refined what regions of these proteins are essential for their role in the viral life cycle, analogous to previous work on the human genome [31]. For ORF18 and ORF34, which are direct late gene transcriptional activators, we examined the relative depletion of in-frame alleles between cells undergoing viral DNA replication and cells releasing virions into the supernatant (Fig 4A and 4B). For ORF46, which likely indirectly activates late genes through its role in DNA replication, we examined the relative depletion of in-frame alleles between cells containing latent virus and those undergoing viral DNA replication (Fig 4C).
as controls, along with three vSAFE guides targeting an unexpressed region of the viral BAC. P values were calculated using a single-tailed, equal variance Student's t-test, with the least significant value used when compared to each individual vSAFE sgRNA. NS indicates non-significance (P>0.01). https://doi.org/10.1371/journal.ppat.1010236.g002

PLOS PATHOGENS
KSHV CRISPR tiling screen for late gene activators Four viral loci were amplified and deep-sequenced: ORF21, ORF18, ORF34, and ORF46. b) Median enrichment across the coding region of the gene of out-of-frame indels in replicating cells relative to latent cells (Viral DNA replication) or the supernatant relative to latent cells (Virion production). Error bars are standard error from two replicates. One replicate of ORF34 supernatant sample was excluded due to uneven coverage. c,d,e) Three individual sgRNAs were lentivirally delivered to KSHV-infected iSLK cells targeting the indicated viral ORF. Error bars are standard error from four independent reactivations. CDV-treated, reactivated, and unactivated parental cells are included as controls, along with three vSAFE sgRNAs targeting an ORF-free region of the BAC. P values were calculated using a single-tailed, equal variance Student's t-test, with the least significant value used when compared to each individual vSAFE sgRNA. NS indicates non-significance (P>0.01). c) Reactivation and early gene expression was measured using a KSHV virus containing a HaloTag fusion to a viral early gene, ORF68. d) Measuring viral DNA replication by percentage positive EdU staining. A 2-hour pulse of EdU was delivered 48 hours post reactivation. Cells were then fixed, and click chemistry was used to measure EdU incorporation by flow cytometry. Unreactivated cells were used to establish gates for flow cytometry, where subgenomic EdU incorporation was measured as viral DNA replication. e) DNA was extracted 48 hours post reactivation, and qPCR of the viral promoter of ORF57 was used to measure the amount of viral DNA. https://doi.org/10.1371/journal.ppat.1010236.g003

PLOS PATHOGENS
KSHV CRISPR tiling screen for late gene activators Indeed, depletion of in-frame alleles in ORF18 corresponded with previously identified residues required for interaction with other vTA complex members ORF30, ORF31, and ORF66 (Fig 4A and S6 Data) [39], and several regions observed in ORF34 correspond to previously phenotyped truncation mutants with defects in binding the vTAs ORF24 or ORF66 (Fig 4B) [40]. In ORF46, we noted that these in-frame mutations near the C-terminus were depleted from replicating DNA samples relative to the rest of the coding region (Fig 4C). This suggests  [39] and ORF34 [40]. c) Smoothed signal from in-frame mutations across the ORF46 loci in comparison between latent and replicating genomes. In-frame mutations were defined as insertions or deletions whose size where divisible by three. The number of mutations was pooled in a 100 bp window and enrichment was calculated relative to the enrichment of out-of-frame mutations. A negative value indicates that in-frame mutations were relatively depleted at the indicated region. Previously described domains and residues are indicated in red if they appear depleted and blue if they do not. d) vDNA replication was measured for iSLK cells infected with KSHV encoding either wildtype ORF46, a catalytic domain mutant of ORF46 (Q87L, D88N), or a DNAbinding domain mutant of ORF46 (H210L). Cells were reactivated, and after 48 hours DNA was extracted, and viral DNA quantity was measured by qPCR of the viral ORF57 promoter. Quantity is relative to unreactivated samples. Error bars are standard error from four independent replicates. https://doi.org/10.1371/journal.ppat.1010236.g004

PLOS PATHOGENS
KSHV CRISPR tiling screen for late gene activators that the DNA-binding domain encoded in the C-terminus is required for viral DNA replication but that the catalytic domain is not. We also observed signal both upstream and downstream of the coding region of ORF46, which may correspond to disruption of the adjacent ORF45 or perturbation to regulatory elements of ORF46.
To test the hypothesis that the role of ORF46 is dependent on its DNA binding activity, we individually mutated residues previously shown for EBV to be required for DNA binding (H210L) or catalysis (Q87L, D88N) in ORF46 using BAC mutagenesis [38]. Reactivation of these mutant viruses confirmed that the DNA binding domain but not the catalytic domain of ORF46 is required for replication of viral DNA and the production of virus (Figs 4D and  S3C). Importantly, we also observed that the mutants still express early genes, supporting a defect specific to vDNA replication (S3D Fig). Together these data orthogonally confirm our prediction that ORF46 and its DNA binding domain are required for viral DNA replication.

Discussion
Late gene expression in oncogenic gammaherpesviruses is a mechanistically unique, essential and potentially targetable viral process. Here we applied an exhaustive tiling approach to KSHV to identify viral proteins required for expression of viral late genes. We identified most of the known late gene regulators and confirmed that one additional protein, ORF46, is required for this process. Targeted deep sequencing of the viral mutants created by Cas9 suggested that the DNA binding domain of ORF46 is required for viral DNA replication, a prediction we validated through viral mutagenesis. This two-level screening approach is broadly applicable to connecting phenotype to genotype for a variety of processes involved in the KSHV lifecycle-as well as more broadly for other dsDNA viruses.
Compact viral genomes present both unique challenges and opportunities for genomicsbased approaches to study viral phenotypes. On one hand, the high functional density in a relatively small genome allows approaches such as ours to perturb every element. For example, here we tile all~80 ORFs and their cis regions with fewer guides than previous studies have used to tile just three human genes [41]. On the other hand, viral loci often encode multifunctional proteins and can also encompass overlapping ORFs and noncoding regulatory sequences, which means that some mutations may impact multiple viral features [5]. This is true for our study as well, as the ORF46 DNA binding domain also overlaps with the promoter and TSS of the adjacent gene ORF45. Thus, although the CRISPR nuclease is less likely to perturb non-coding elements [41], some of the signal we see may be from effects on ORF45 transcription (Fig 4C). For high-throughput experiments, screening for additional phenotypes may be required to separate the multiple functions encoded by the virus.
ORF46 encodes for a uracil-DNA N-glycosylase (UNG)-a DNA repair enzyme responsible for the removal of uracil from DNA [37]-and UNG homologs in EBV, human cytomegalovirus (HCMV), and vaccinia virus are all required for viral DNA replication [38,[42][43][44]. Notably, the DNA binding domain but not UNG activity is required for DNA replication in both EBV [38] and KSHV, suggesting that UNGs may play conserved structural rather than enzymatic roles in genome replication for multiple dsDNA viruses.
Beyond ORF46, our initial screen highlighted additional viral genes with more modest effects than the canonical late gene regulators (S4 Data). While their roles in late gene expression remain unvalidated, they may represent a deeper cast that either play a minor role or whose signal is limited technically. Technical limitations such as poor sgRNA performance or number may also explain the weak signal from smaller genes such as ORF30 and ORF31, two additional components of the vTA complex. An additional limitation is that less than 25% of cells detectably express the late gene reporter (Fig 1A); while this percentage is consistent with

PLOS PATHOGENS
KSHV CRISPR tiling screen for late gene activators previous reports [45,46], cell models allowing for more efficient late gene expression or detection should further boost the dynamic range and sensitivity. While individual sgRNAs targeting the major transactivator ORF50 demonstrated the expected early-stage block to lytic cycle progression (Figs 1D, S2D, and S2E), it was surprising that they were positively enriched in the late gene screen (Fig 2B and 2C). This may represent an artifact of the screen design, as they target both the ORF50 on the viral genome and the dox-inducible ORF50 integrated in the iSLK genome. It is also possible that ORF50 has bifunctional roles in promoting reactivation but suppressing late gene expression prior to DNA replication, although additional work would be needed to evaluate this hypothesis. Additionally, positive signals were detected for KSHV genes required for latency, such as ORF73 (LANA) and the GFP/HygroR cassette (Fig  2B and 2C), but their relevancy to late gene expression is unclear given that the screen was performed under strong selection for latency maintenance. We have previously observed that guides targeting ORF75 caused a similar defect in latency maintenance.
The tiling library is highly modular, as it can be used for example with base editors to create base pair mutations, CRISPRi to induce transcriptional repression or dCas9 to block transcription factor binding, as well as in combination with other reporters of viral activity. Overall, the tiling approach is a powerful method for functional interrogation of KSHV and other DNA viruses with the added potential to highlight essential domains and unlock a wealth of information about viral biology.

Creation and characterization of late gene reporter cell line
A late gene reporter was designed using 100 bp upstream of the K8.1 ORF driving a mIFP2 fluorescent cassette. 150 bp of homology to the BAC region of BAC16 was included along with a KAN resistance cassette to allow the insertion into BAC16 genome [33]. The viral genome was then purified using the Macherey-Nagel NucleoBond BAC 100 kit and transfected into HEK293T cells using PolyJet (SignaGen). Transfected cells were then cocultured with uninfected iSLK cells and reactivated using 0.33 mM sodium butyrate and 20 ng/mL PMA. Infected iSLK cells were then selected using 1 μg/mL puromycin, 50 μg/mL G418, and 200 μg/mL hygromycin.
To test the kinetics of late-gene reporter expression, iSLK-BAC16-K8.1pr-mIFP2 cells were reactivated using 1 mM sodium butyrate and 5 μg/mL doxycycline, and timepoints were PLOS PATHOGENS KSHV CRISPR tiling screen for late gene activators collected every 24 hours. Cells were then fixed in 4% PFA and quantified using flow cytometry (BD LSRFortessa). To test the sensitivity of the late gene reporter-infected lines, cells were reactivated as above and additionally treated with viral polymerase inhibitors 500 μM PAA or 100 μM CDV; late-gene reporter activity of fixed cells was quantified by flow cytometry 72 hours post reactivation (BD LSRFortessa). To test the production of virus in these cells, supernatant from reactivated samples was 0.45 um filtered and added to uninfected HEK293T cells. After 24 hours, HEK293T cells were fixed, and GFP was quantified by flow cytometry (BD Accuri C6 Plus). Replicates were independent reactivations on separate days.
To demonstrate the editing activity, three independent sgRNAs were designed targeting two noncoding regions of the BAC and four viral genes. sgRNAs were designed using IDT Custom Alt-R designer. As controls, three safe-targeting sgRNAs were used that target the host in predicted non-functional regions [47]. sgRNAs were cloned into a mU6-driven guide expression plasmid (Addgene #89359) and delivered lentivirally at high MOI. After 10 days, cells were reactivated by 5 μg/mL doxycycline and 1 mM sodium butyrate, and viral supernatant was collected and 0.45 um filtered after 72 hours. Supernatant was transferred to uninfected HEK293T cells, and viral infectivity was measured using GFP by flow cytometry (BD Accuri C6 Plus). Replicates were independent reactivations on separate days.

Design of KSHV tiling library
A KSHV tiling library was designed using custom python scripts. Each NGG PAM was identified in the reference genome [33] (S1 and S3 Data) along with the corresponding 19 bp guide. Only one copy of sgRNAs that targeted the KSHV genome more than once was included. The library was cloned using a modified protocol from Deans et al 2016 [48]. sgRNAs were synthesized along with primer binding sites and BstXI/BlpI restriction sites using Twist Biosciences. The library was amplified using corresponding primers for 10 cycles. Amplified library was PCR purified (Qiagen MinElute) and restricted using BstXI and BlpI. 34 bp fragment was purified in water using a native 20% PAGE gel and Spin-X centrifuge tube filters (Costar). Fragment was isoproponal precipitated and ligated into a sgRNA expression plasmid (Addgene #89359) using T4 ligase overnight at 16 degrees and electroporated into Lucigen Endura cells (1800V, 600 ohm, 10μF, 1mm) before plating on 4 assay plates (reduced 75% CARB). Colonies were grown at 37 degrees C overnight, pooled, and maxi prepped (Macherey-Nagel).

Screen and analysis of late gene screen
Cas9+ iSLK-BAC16-K8.1pr-mIFP2 cells were infected lentivirally with the KSHV tiling library. After 10 days, cells were reactivated using 1 mM sodium butyrate and 5 μg/mL doxycycline. Two days post reactivation, cells were trypsinized, washed 2x with dPBS, and fixed in 4% PFA for 10 minutes before being washed 2x in dPBS again. Cells were then sorted based on mIFP2 expression using a BD Aria, sorting for top 25% expression vs bottom 50% expression (BD Aria). Sorted cells were then unfixed overnight in 50 μg/mL proteinase K (Promega) plus 150 mM NaCl at 65˚C. Genomic DNA was extracted using the Qiagen blood mini kit. As described in Deans et al 2016 [48], the sgRNA locus was amplified using primers and Herculase II (Agilent) with 20 cycles. Nextera-index adapters were then ligated by PCR using 18 cycles.~285 bp product was gel extracted and quantified with Qubit and an Agilent Bioanalyzer before sequencing on a HiSeq 4000.
Average enrichment of sgRNAs from two replicates were calculated comparing the lategene high and late gene low fractions using a median normalized log ratio of fraction of counts as previously described [49]. sgRNAs that targeted the viral genome at multiple loci were removed from the analysis. Each region of the viral genome was split using previous annotations (S3 Data), and the median enrichment of all sgRNAs targeting each region was compared to all negative control sgRNAs using a Mann-Whitney U test.

Screen validation, early gene expression, and viral DNA replication
Three sgRNAs from the screen were selected, cloned as above, lentivirally delivered to late gene expression reporter cells, and assayed for virion production and late gene expression activity as above. sgRNAs were also delivered to Cas9+ iSLK cells infected with a modified BAC16 virus containing an N-terminal HaloTag fusion to the early viral gene ORF68. 24 hours post reactivation with 1 mM sodium butyrate and 5 μg/mL doxycycline, cells were treated with 20 nM HaloTag JF647 (Promega) and ORF68 levels were quantified by flow cytometry (Accuri C6 plus). To quantitate viral DNA replication, 20 μM EdU was added to the media 48 hours post reactivation for two hours. Cells were then trypsinized and fixed using 4% PFA. EdU was labeled with Cy5 using the Click-IT flow cytometry kit (Invitrogen) and quantified by flow cytometry (BD Accuri C6 Plus). Using unreactivated cells as a control, little host DNA replication was observed in reactivated cells. EdU levels below those observed in unreactivated cells but above background were quantified as viral DNA replication. To quantitate the amount of viral DNA, 48 hours after reactivation, DNA was extracted using DNA QuickExtract (Lucigen), and qPCR was performed using iTaq Universal SYBR Green (Bio-rad) amplifying the ORF57 promoter region of KSHV (QuantStudio 3). Viral quantities were calculated by standard curve. Replicates were independent reactivations on separate days.

Design and analysis of targeted viral sequencing
Cas9+ iSLK-BAC16-K8.1pr-mIFP2 cells containing the KSHV tiling library were reactivated using 1 mM sodium butyrate and 5 μg/mL doxycycline. Genomic DNA was recovered from latent cells prior to reactivation and 48 hours post reactivation using a QIAamp DNA blood Mini kit (Qiagen). To collect viral supernatant, media was replaced 48 hours post reactivation and collected 72 hours post reactivation. Viral supernatant was then mixed with appropriate volume of Qiagen AL buffer and Qiagen proteinase. Sample was then spun over a single QIAamp blood mini column, washed, and eluted as the manual indicates.
PCR was performed from each DNA sample for four regions (ORF21, ORF34, ORF46, and ORF18) using Herculase II (Agilent). Each region was then PCR purified (GeneJet; Thermo) and pooled in equal nanogram amounts. 500 ng of each pooled sample was then prepped using Illumina DNA prep and sequenced on a single lane of SP 150PE NovaSeq 6000.
Reads were trimmed using cutadapt [50], and aligned to the BAC16 genome using bwa mem [51]. Indel counts were then extracted from CIGAR codes, with identical CIGAR codes excluded as duplicates. Complex codes not corresponding to simple indels were excluded. Deletions and insertions at the same position and same size observed in a control sample amplified from purified BAC were excluded as artifacts of library prep (S5 Data and S1 Table). Insertions or deletions whose size were divisible by three were classified as in-frame mutations and all others were classified as out-of-frame mutations. Mutation counts and coverage were calculated as the sum of two replicates for each condition. Basepairs with less than 10^5 coverage were excluded, and a single read was added to each basepair to prevent division by zero. Mutation frequencies across each 100 bp were averaged and a log enrichment value was calculated between either the latent sample and the 48-hour sample, or the 48-hour sample and the supernatant sample. For each comparison, the difference in log enrichment between in-frame mutations and out-of-frame mutations was calculated and used as the relevant signal.

ORF46 BAC mutagenesis
ORF46 DNA binding (H210L) or catalytic (Q87L, D88N) mutants were introduced into BAC16 using red recombination [33]. Briefly, homology arms were used to insert mutations along with a KAN resistance cassette into the endogenous ORF46 locus. The KAN cassette was removed by recombination, and the scarless edit was confirmed by Sanger sequencing. Viral genomes were then purified using the Macherey-Nagel NucleoBond BAC 100 kit and transfected into HEK293T cells using PolyJet (SignaGen). Transfected cells were then cocultured with uninfected iSLK cells and reactivated using 0.33 mM sodium butyrate and 20 ng/mL PMA. Infected iSLK cells were then selected using 1 μg/mL puromycin, 50 μg/mL G418, and 125 μg/mL hygromycin.
iSLK cells infected with the mutant BAC16 viruses were then reactivated using 5 μg/mL doxycycline and 1 mM sodium butyrate. After 72 hours, supernatant was 0.45 um filtered and transferred to uninfected HEK293T cells. 24 hours later, infection was monitored by EGFP expression using flow cytometry (BD Accuri C6 plus). Replicates were independent reactivations from a single experiment.
Separately, DNA was extracted from cells 48 hours post reactivation using Lucigen DNA QuickExtract. Using iTaq Universal SYBR Green (Bio-rad), vDNA replication was then measured using qPCR of the viral ORF57 promoter. Unreactivated cells were used as a control. Replicates were independent reactivations from separate days.
RNA was extracted from cells 48 hours post reactivation using Lucigen RNA quickextract. RNA extraction was performed using Lucigen RNA QuickExtract. Turbo DNAse (Invitrogen) was used to remove DNA. AMV RT (Promega) with 9 bp random primers was used for reverse transcription. qPCR was performed using iTaq Universal SYBR Green (Bio-rad) amplifying the coding region of ORF6 along with primers targeting the host 18S RNA. Replicates are from multiple reactivations performed on the same day.
Supporting information S1 Fig. a) Design of a late gene reporter. A cassette expressing the far-red fluorescent protein driven by the 100 basepair promoter sequence of the KSHV late gene K8.1 was inserted upstream of the EGFP cassette of the BAC16 KSHV genome. A KanR cassette was included to allow selection in bacteria. b) Delivery and editing during latent infection of iSLK cells. Cas9-blast was lentivirally delivered to BAC16 infected iSLK lines. After selection, mU6-driven sgRNAs were delivered lentivirally. Cells were maintained in a latent state for 1-2 weeks to allow sufficient time for editing before reactivation by doxycycline and sodium butyrate. c) Schematic indicating targeting of vSAFE sgRNAs. d) Example flow data showing reactivation and expression of late gene reporter cells. Blue cells indicate late gene reporter negative cells, and red cells indicated late gene reporter positive cells. iSLK cells infected with the K8.1pr-mIFP2 BAC16 virus were reactivated with doxycycline and sodium butyrate, then fixed and analyzed 48 hours later by flow cytometry. EGFP is activated at 488 nm with a 595 LP, 525/50 filter. mIFP2 is activated at 640nm with a 750LP, 780/60 filter. (TIF) S2 Fig. Individual results from late gene tiling screen. a) Average enrichment score from two replicates across the 154 kb genome. b) Guide-level reproducibility of sgRNA enrichments from two independent replicates (R 2 = 0.5; p<10 −200 ). Positive value indicates the sgRNA is enriched in the late-gene expressing fraction and thus promotes late-gene expression; negative value indicates the sgRNA is depleted from the late-gene expressing fraction and thus suppresses late gene expression. c) Gene-level reproducibility of p-values from two replicates (R 2 = 0.82; p<10 −70 ). To calculate significance, sgRNAs targeting each annotated region of the genome were grouped and a signed log Mann-Whitney p-value was calculated comparing each viral region to the negative control sgRNAs. A positive log p-value indicates this region promotes late gene expression when disrupted, and a negative log p-value indicates this region is required for expression of late genes. d) Reactivation and early gene expression of cells containing ORF50-targeting guides was measured using a KSHV virus containing a HaloTag fusion to a viral early gene, ORF68. Error bars are standard error from four independent reactivations. e) Expression of late gene reporter in cells containing ORF50-targeting guides. Error bars are standard error from four independent reactivations. (TIF) S3 Fig. Additional results from targeted viral sequencing. a) Representative spectra of indel sizes in a single replicate of latent genome. b) Median enrichment across the coding region of the gene of out-of-frame indels in replicating cells relative to latent cells (Viral DNA replication), the supernatant relative to latent cells (Virion production), or supernatant relative to replicating cells (Viral DNA replication vs Virion production). Error bars are standard error from two replicates. One replicate of ORF34 supernatant sample was excluded due to uneven coverage. c) Virion production was measured for iSLK cells infected with KSHV encoding either wildtype ORF46, a catalytic domain mutant of ORF46 (Q87L, D88N), or a DNA-binding domain mutant of ORF46 (H210L). Cells were reactivated, and after 72 hours supernatant was filtered and transferred to uninfected HEK293T cells. Infection was monitored by expression of the BAC16-encoded, constitutive EGFP. Error bars are standard error from three technical replicates. d) RNA expression of a viral early gene ORF6. 24 hours after reactivation, RNA was extracted, and RT-qPCR was performed with primers targeting the coding region of the viral early gene ORF6. Primers targeting the host 18S RNA were used as a control. Error bars are standard error from four technical replicates. (TIF) S1 Table.