A compact Cas9 ortholog from Staphylococcus Auricularis (SauriCas9) expands the DNA targeting scope

Compact CRISPR/Cas9 systems that can be packaged into an adeno-associated virus (AAV) hold great promise for gene therapy. Unfortunately, currently available small Cas9 nucleases either display low activity or require a long protospacer adjacent motif (PAM) sequence, limiting their extensive applications. Here, we screened a panel of Cas9 nucleases and identified a small Cas9 ortholog from Staphylococcus auricularis (SauriCas9), which recognizes a simple NNGG PAM, displays high activity for genome editing, and is compact enough to be packaged into an AAV for genome editing. Moreover, the conversion of adenine and cytosine bases can be achieved by fusing SauriCas9 to the cytidine and adenine deaminase. Therefore, SauriCas9 holds great potential for both basic research and clinical applications.


Introduction
The RNA-guided CRISPR/Cas9 system is a powerful tool for genome editing in diverse organisms and cell types [1][2][3][4][5]. In this system, a Cas9 nuclease and a guide RNA (gRNA) form a Cas9-gRNA complex, which recognizes a gRNA complementary DNA sequence and generates a site-specific double-strand break (DSB) [1,2,6]. Target site recognition also requires a specific protospacer adjacent motif (PAM) [6], which limits the targeting scope of Cas9 for precise positioning. Cas9 derived from Streptococcus pyogenes (SpCas9) is the most extensively applied variant because of its high efficiency and simple PAM requirement [6], but the SpCas9 gene (4.1 kbp) and its gRNA sequence are too large to be packaged together into an adeno-associated virus (AAV) [7] for efficient delivery into cells in vivo. In a search for smaller Cas9 nucleases, a type II-A Staphylococcus aureus Cas9 (SaCas9) was identified for in vivo genome editing [8], but this Cas9 ortholog is used infrequently because of the requirement of a longer PAM sequence (NNGRRT). Several small types of II-C Cas9 orthologs have been discovered for genome editing [9][10][11][12][13], but they either require long PAM sequence or display reduced activity [12,14,15]. Therefore, there remains a need for smaller Cas9 nucleases with high efficiency and shorter PAMs.
We have recently developed a highly sensitive green fluorescent protein (GFP) reporterbased method that allows PAM profiling in mammalian cells [16]. Here, we demonstrate that this method also enables the identification of Cas9 nucleases for genome editing in mammalian cells. Our newly developed method differs from previous screens, which typically rely on library cleavage in vitro or performed in bacteria in which identified Cas9 orthologs often do not work in mammalian cells [8,17]. We identified a naturally occurring Cas9 nuclease from S. auricularis (SauriCas9) that recognizes NNGG PAM, which occurs, on average, once in every 8 randomly chosen genomic loci. Importantly, SauriCas9 possesses a small gene size (3.3 kbp) that can be packaged into AAV together with its gRNA, holding great promise for gene therapy.

Identification of SauriCas9 as a novel genome editing tool
To identify novel Cas9 nucleases for genome editing, we employed a GFP reporter-based method that allows Cas9 screening in mammalian cells [16]. In this method, GFP is inactivated by insertion of a target sequence followed by 7 bp of randomized DNA sequences. If a Cas9 ortholog facilitates DNA cleavage, transfection of it together with a gRNA will result in GFP expression ( Fig 1A). We screened a panel of 30 Cas9 nucleases from different bacteria strains ( Fig 1B). Each Cas9 gene was human codon optimized and cloned into a SaCas9 expressing vector by replacing SaCas9 [8]. The trans-activating CRISPR RNA (tracrRNA) sequences used were predicted by a bioinformatic tool (S1 Table) [18]. We designed gRNA scaffolds for each ortholog by fusing the 3 0 end of a direct repeat with the 5 0 end of the corresponding tracrRNA, including the full-length tail, via a 4-nucleotide linker. However, we did not observe any GFPpositive cells from the first round of screening. It was possible that we did not design proper gRNA scaffolds, leading to an unsuccessful screen. As it has been reported that tracrRNA can be exchanged between closely related CRISPR/Cas9 systems [19] and we noticed that Sauri-Cas9 was phylogenetically related to SaCas9, we therefore combined SauriCas9 with SaCas9 tracrRNA, which led to GFP expression (Fig 1C), demonstrating the potential of this Cas9 nuclease for genome editing.

PAM analysis
The CRISPR/SauriCas9 locus consists of 3 Cas genes, including Cas9, Cas1, and Cas2; 12 repeat sequences; and a putative tracrRNA (Fig 2A and 2B). Protein sequence alignment revealed that SauriCas9 (1,061 amino acids [aas]) has 62.4% sequence identity with SaCas9 (S1 Fig). We noticed that the SauriCas9 PAM-interacting domain (PID) has multiple aa variations compared with SaCas9 (S1 Fig). Importantly, the variations include 2 aas (S991 and L996) that are crucial for PAM recognition (S1 Fig) [20], suggesting that SauriCas9 may recognize PAMs differently from SaCas9. To identify PAM sequences that are recognized by SauriCas9, we sorted out GFP-positive cells, and sequences containing 7-bp randomized DNA were PCR amplified for deep sequencing. Sequencing results revealed that insertions/deletions (indels) were mainly associated with NNGG PAM (Fig 2C). Consistently, both WebLogo (http:// weblogo.threeplusone.com/) and PAM wheel revealed that SauriCas9 preferred NNGG PAM Laboratory Opening Program SKLGE1809 and 111 project (B13016); SO was supported by grants from the National Institutes of Health R00HL130416. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: The authors have applied for a patent related to this work.    Fig 2D and 2E). In addition, PAM wheel revealed that SauriCas9 also recognized NNNGG PAM (Fig 2E).

Genome editing with SauriCas9
To test the genome editing capability of SauriCas9, we then constructed GFP-reporter plasmids containing either NNGG or NNNGG ( Fig 3A) and established stable cell lines expressing these reporters. Transfection of SauriCas9 with gRNA induced GFP expression for both PAMs, but the efficacy of NNNGG PAM was much lower than NNGG PAM ( Fig 3B).
Next, we tested the genome editing capability of SauriCas9 with a panel of 12 endogenous gene targets in HEK293T cells. SauriCas9 generated indels at all 12 endogenous loci, but efficiencies varied depending on the targets (Fig 3C). These 12 target sites also contained NNGRRT PAM that can be edited by SaCas9, allowing a side-by-side comparison. Quantitative reverse transcription polymerase chain reaction (qRT-PCR) revealed that the expression of SauriCas9 and SaCas9 was comparable (S2 Fig). Although not significant, SauriCas9 showed higher activity at most of the tested loci ( Fig 3C). We further tested genome editing capacity of SauriCas9 in additional cell types, including A375, A549, HeLa, human foreskin fibroblast (HFF, primary cells) cells, and N2a (mouse neuroblastoma cell line) cells. SauriCas9 generated indels in all these cell types with varying efficacies (S3A- S3E Fig). We also tested 3 endogenous targets with NNNGG PAM, but the indel frequencies were less than 2% (S4 Fig). We finally packaged SauriCas9 together with gRNA into AAV and infected either HEK293T, HCT116, A375, or HFF cells. Indels could be detected in all cell types, but frequencies varied depending on the loci and cell types (S5A- S5E Fig). Taken together, SauriCas9 offers a novel platform for genome editing.

Base editing with SauriCas9
Base editing is a powerful technology that enables programmable conversion of single nucleotides in the mammalian genome. This technology relies on fusion of a catalytically disabled Cas9 nuclease to a nucleobase deaminase enzyme. Two types of programmable deaminases have been developed: cytosine base editors (CBEs), inducing C-to-T conversions [21], and adenine base editors (ABEs), inducing A-to-G conversions [22]. To increase base editing scope, several CRISPR nucleases have been employed, including SpCas9 [21,22], engineered SpCas9 variants [23,24], SaCas9 [24], and Cpf1 [25].
To test whether SauriCas9 can be employed for base editing, we generated nickase form of SauriCas9 (SauriCas9n) by introducing D15A mutation (S1 Fig). To confirm that SauriCas9n can induce nicks in genomic DNA, we inserted a pair of target sequences (E1 and S6) into a GFP-reporter plasmid and established a stable cell line (S6A Fig). If SauriCas9n is able to generate double nicking, indels will occur, and GFP-positive cells can be observed. The doublenicking strategy has been used to improve specificity of SpCas9 [26]. When we transfected wild-type SauriCas9 with a single gRNA, GFP expression could be easily observed (S6B Fig). When we transfected SauriCas9n with a single gRNA, GFP expression rarely occurred. However, when we transfected SauriCas9n with 2 gRNAs (E1 + S6) targeting each DNA strand, sequencing revealed that targets with NNGG PAM can be efficiently edited in the GFP-reporter assay. GFP sequence is shown in green; insertion mutations are shown in red; NNGG PAM sequences are highlighted in yellow; GCG trinucleotide is used to fix 7-bp random sequence. (D) WebLogo is generated from deep-sequencing data. (E) PAM wheel is generated from deep-sequencing data. GFP, green fluorescent protein; PAM, protospacer adjacent motif; SauriCas9, Cas9 nuclease from S. auricularis.
https://doi.org/10.1371/journal.pbio.3000686.g002 Quantification is shown on the right. Underlying data for all summary statistics can be found in S1 Data. (C) Genome editing with SauriCas9 and SaCas9 for 12 endogenous loci. PAMs are underlined (n � 2). Underlying data for all summary statistics can be found in S1 Data. GFP, green fluorescent protein; gRNA, guide RNA; PAM, protospacer adjacent motif; SaCas9, S. aureus Cas9; SauriCas9, Cas9 nuclease from S. auricularis. GFP expression could be easily observed, indicating that double nicking occurred. These data demonstrated that SauriCas9n is able to induce nicks.

Specificity analysis of SauriCas9
Next, we evaluated the off-target activity of SauriCas9 by using the GFP-reporter cell line ( Fig  5A). We generated a panel of gRNAs with dinucleotide mutations (Fig 5A). SauriCas9 showed higher activity for both on-target and off-target cleavage compared with SaCas9 ( Fig 5A). One limitation of the GFP-reporter assay is that it could not reveal the real indel frequencies. To analyze whether off-target occurs at endogenous loci, we selected a target (G10) with 2 potential off-target sites (G10-OT1 and G10-OT2), which have 3 mismatches and contain PAMs that can be targeted by both SauriCas9 and SaCas9 (Fig 5B). Following transfections of Cas9 + gRNA plasmids, genomic DNA was extracted for targeted deep sequencing. The sequencing results revealed that SauriCas9 and SaCas9 induced indels with very low efficiencies for both off-target sites (Fig 5C and 5D).
To compare genome-wide off-target effects of SauriCas9 and SaCas9, GUIDE-seq assay was performed [28]. Following transfection of Cas9 + gRNA (G1) plasmids and GUIDE-seq oligos, we prepared libraries for deep sequencing. Sequencing and analysis revealed that on-target cleavage occurred for both Cas9 orthologs, reflected by GUIDE-seq read counts (Fig 5E). We identified 2 off-target sites for SaCas9 and 1 off-target site for SauriCas9. Interestingly, Sauri-Cas9 and SaCas9 shared an off-target site because this site contained a PAM for both Cas9 orthologs. In summary, these data indicated that SauriCas9 can induce off-target cleavage at endogenous loci in cells.

A chimeric Cas9 nuclease displays high fidelity and broad targeting scope
Slaymaker and colleagues have generated a high-fidelity version of SaCas9 variant (eSaCas9) by weakening interactions between Cas9 and the target DNA [29]. To generate a Cas9 with high fidelity and broad targeting scope, we replaced the PID of eSaCas9 with that of SauriCas9, resulting in a recombinant chimera, which we named eSa-SauriCas9 (Fig 6A). The GFPreporter assay revealed that eSa-SauriCas9 recognized NNGG PAM (Fig 6B and 6C) while displaying improved specificity compared with SauriCas9 ( Fig 6D). We tested the genome editing capability of eSa-SauriCas9 with a panel of 14 endogenous gene targets in HEK293T cells. eSa-SauriCas9 generated indels at all 14 endogenous loci with varied efficiencies depending on the targets (Fig 6E).

Expanding SauriCas9 targeting scope
A previous study has shown that triple mutations (E782K/N968K/R1015H) on SaCas9 (SaCas9-KKH) relieve restriction on position 3 of PAM, leading to NNNRRT recognition [30]. To broaden the targeting scope of SauriCas9, we introduced triple mutations (Q788K/ Y973K/R1020H) that are identical to E782K/N968K/R1015H on SaCas9 (S1A Fig), resulting in SauriCas9-KKH (Fig 7A). GFP-reporter assay revealed that SauriCas9-KKH preferred NNRG PAM (Fig 7B and 7C). We inserted a target with NNAG PAM into a GFP-reporter plasmid and compared the efficacy between SauriCas9 and SauriCas9-KKH on this PAM. The results revealed that SauriCas9-KKH was more effective with NNAG PAM (S7 Fig). The GFPreporter assay revealed that SauriCas9-KKH displayed improved specificity compared with SauriCas9 overall (Fig 7D). We tested the editing capacity of SauriCas9-KKH with a panel of 15 endogenous targets, and indels could be easily detected after genome editing (Fig 7E).

Discussion
Small Cas9 nucleases (<1,100 aas) that can be packaged into an AAV hold great promise for gene therapy. Although several small Cas9 nucleases have been used for genome editing, the range of targetable sequences remains limited because of the requirement for a PAM sequence flanking a given target site. SaCas9 is the first small Cas9 ortholog that has been delivered by AAV vector for in vivo genome editing [8], but this Cas9 ortholog is used infrequently because of the requirement for a longer PAM sequence (NNGRRT). Engineered SaCas9 variants can increase targeting scope [30,31], but this increase in targeting scope often comes at a cost of reduced on-target activity. Three types of II-C Cas9 orthologs have been used for mammalian genome editing, including N. meningitidis Cas9 (NmeCas9) [10,13], Campylobacter jejuni Cas9 (CjeCas9) [11] and N. meningitidis Cas9 (Nme2Cas9) [9]. These 3 Cas9 nucleases recognize N4GAYW/N4GYTT/N4GTCT, N4RYAC, and N4CC PAMs, respectively. However, type II-C Cas9 nucleases generally display low editing efficiency [12,14]. More recently, CasX (<1,000 aas) recognizing TTCN PAM has been shown to enable genome editing in mammalian cells [32], expanding the targeting scope of small Cas nucleases.
By using a GFP-reporter strategy that we previously developed [16], we identified SauriCas9 as a novel nuclease for mammalian genome editing. SauriCas9 consists of 1,061 aas, which can be packaged into AAV for genome editing. Importantly, SauriCas9 recognizes a simple NNGG PAM, expanding the targeting scope of small Cas9 nucleases. We also generated a SauriCas9-KKH to expand the targeting scope. In addition, we have demonstrated the versatility of SauriCas9, which can be adapted for base editing. Recently, mini-SaCas9 that only retains DNA binding activity has been developed [33,34]. Mini-SaCas9 can be used for a variety of applications by fusing with other effectors. It will be interesting to engineer SauriCas9 for mini-SauriCas9 in the future. With further development, we anticipate that SauriCas9 can be an important genome editing tool for both basic research and clinical applications.
hU6-Sa_tracr plasmid. The pX601 vector containing hU6-Sa_tracr was PCR amplified using primer hU6-F/ORI-R, followed by phosphorylation with T4 polynucleotide kinase (NEB) and religation with T4 DNA ligase (NEB). gRNAs were inserted into hU6-Sa_tracr plasmid between 2 Bsa1 restriction sites. All primer sequences are listed in S2 Table; all target sequences are listed in S3 Table. PAM sequence analysis Twenty base pair sequences (AAGCCTTGTTTGCCACCATG/GTGAGCAAGGGCG AGGAGCT) flanking the target sequence (GAACGGCTCGGAGATCATCATTGCG NNNNNNN) were used to fix the target sequence. GCG and GTGAGCAAGGGCG AGGAGCT were used to fix 7-bp random sequence. Target sequences with in-frame mutations were used for PAM analysis. The 7-bp random sequence was extracted and visualized by WebLog3 [35] and PAM wheel chart to demonstrate PAMs [36].

Verification of PAM sequence with GFP-reporter constructs
Two plasmids containing NNGG (CTGG) and NNNGG (TCTGG) PAM sequences were isolated from the PAM library. Each plasmid was packaged into lentivirus to generate stable cell lines. To remove background mutations that induce GFP expression, the GFP-negative cells were sorted by MoFlo XDP machine. The sorted cells were seeded into 24 wells and transfected with SauriCas9+gRNA plasmid (800 ng) by Lipofectamine 2000 (Life Technologies). Three days after editing, the GFP-positive cells were analyzed on the Calibur instrument (BD). Data were analyzed using FlowJo.

Base editing with SauriABEmax/SauriBE4max
HEK293T cells were seeded into 24 wells and transfected with SauriABEmax/SauriBE4max and hU6-Sa_tracr-gRNA. Cells were collected, and the genomic DNA was isolated 3 days after transfection. The target sites were PCR amplified and extracted by a Gel Extraction Kit (QIA-GEN). The efficiency of the base editing was measured by deep sequencing.

Test of SauriCas9 specificity
To test the specificity of SauriCas9, we generated a GFP-reporter cell line with NNGG (CTGG) PAM. The cells were seeded into 48 wells and transfected with Cas9-expressing plasmids (300 ng) and hU6-Sa_gRNA plasmids (200 ng) by Lipofectamine 2000 (Life Technologies). Three days after editing, the GFP-positive cells were analyzed on a Calibur instrument (BD). Data were analyzed using FlowJo.  Specificity of SauriCas9-KKH is measured by the GFP-reporter assay. Underlying data for all summary statistics can be found in S1 Data. A panel of gRNAs with dinucleotide mismatches (red) is shown below. (E) SauriCas9-KKH generates indels for a panel of 15 endogenous loci (n � 2). Underlying data for all summary statistics can be found in S1 Data. GFP, green fluorescent protein; gRNA, guide RNA; Indel, insertion/deletion; PAM, protospacer adjacent motif; PID, PAM-interacting domain; SauriCas9, Cas9 nuclease from S. auricularis; SauriCas9-KKH, triple mutations (Q788K/Y973K/R1020H) on SauriCas9.

Quantification and statistical analysis
All the data are shown as mean ± SD. Statistical analyses were conducted using Microsoft Excel. Two-tailed, paired Student's t tests were used to determine statistical significance when comparing 2 groups. A value of p < 0.05 was considered to be statistically significant. ( � p < 0.05, �� p < 0.01, ��� p < 0.001).