Robust activation of microhomology-mediated end joining for precision gene editing applications

One key problem in precision genome editing is the unpredictable plurality of sequence outcomes at the site of targeted DNA double stranded breaks (DSBs). This is due to the typical activation of the versatile Non-homologous End Joining (NHEJ) pathway. Such unpredictability limits the utility of somatic gene editing for applications including gene therapy and functional genomics. For germline editing work, the accurate reproduction of the identical alleles using NHEJ is a labor intensive process. In this study, we propose Microhomology-mediated End Joining (MMEJ) as a viable solution for improving somatic sequence homogeneity in vivo, capable of generating a single predictable allele at high rates (56% ~ 86% of the entire mutant allele pool). Using a combined dataset from zebrafish (Danio rerio) in vivo and human HeLa cell in vitro, we identified specific contextual sequence determinants surrounding genomic DSBs for robust MMEJ pathway activation. We then applied our observation to prospectively design MMEJ-inducing sgRNAs against a variety of proof-of-principle genes and demonstrated high levels of mutant allele homogeneity. MMEJ-based DNA repair at these target loci successfully generated F0 mutant zebrafish embryos and larvae that faithfully recapitulated previously reported, recessive, loss-of-function phenotypes. We also tested the generalizability of our approach in cultured human cells. Finally, we provide a novel algorithm, MENTHU (http://genesculpt.org/menthu/), for improved and facile prediction of candidate MMEJ loci. We believe that this MMEJ-centric approach will have a broader impact on genome engineering and its applications. For example, whereas somatic mosaicism hinders efficient recreation of knockout mutant allele at base pair resolution via the standard NHEJ-based approach, we demonstrate that F0 founders transmitted the identical MMEJ allele of interest at high rates. Most importantly, the ability to directly dictate the reading frame of an endogenous target will have important implications for gene therapy applications in human genetic diseases.

One key problem in precision genome editing is the unpredictable plurality of sequence outcomes at the site of targeted DNA double stranded breaks (DSBs). This is due to the typical activation of the versatile Non-homologous End Joining (NHEJ) pathway. Such unpredictability limits the utility of somatic gene editing for applications including gene therapy and functional genomics. For germline editing work, the accurate reproduction of the identical alleles using NHEJ is a labor intensive process. In this study, we propose Microhomologymediated End Joining (MMEJ) as a viable solution for improving somatic sequence homogeneity in vivo, capable of generating a single predictable allele at high rates (56%~86% of the entire mutant allele pool). Using a combined dataset from zebrafish (Danio rerio) in vivo and human HeLa cell in vitro, we identified specific contextual sequence determinants surrounding genomic DSBs for robust MMEJ pathway activation. We then applied our observation to prospectively design MMEJ-inducing sgRNAs against a variety of proof-of-principle genes and demonstrated high levels of mutant allele homogeneity. MMEJ-based DNA repair at these target loci successfully generated F0 mutant zebrafish embryos and larvae that faithfully recapitulated previously reported, recessive, loss-of-function phenotypes. We also tested the generalizability of our approach in cultured human cells. Finally, we provide a novel algorithm, MENTHU (http://genesculpt.org/menthu/), for improved and facile prediction of candidate MMEJ loci. We believe that this MMEJ-centric approach will have a broader impact on genome engineering and its applications. For example, whereas somatic mosaicism hinders efficient recreation of knockout mutant allele at base pair resolution via the standard NHEJ-based approach, we demonstrate that F0 founders transmitted the identical MMEJ allele of interest at high rates. Most importantly, the ability to directly dictate the reading frame of an endogenous target will have important implications for gene therapy applications in human genetic diseases. PLOS  Introduction Programmable nucleases such as TALEN (Transcription Activator-like Effector Nuclease) and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems have enabled a new era of scientific research [1,2]. Instead of relying on knock-down models or expensively outsourced knock out lines, laboratories across the world now have tools with which to generate indels (insertions and deletions) of varying sizes on the gene(s) of interest. However, DNA Double-strand Break (DSB) repairs largely result in diverse sequence outcomes owing to the unpredictable nature of the most commonly used Non-homologous End Joining (NHEJ) pathway [3,4] (Fig 1). This significantly confounds experimental readouts as knock-out cell lines often harbor more than just one desired frameshift mutation. In the case of model organisms such as zebrafish (Danio rerio), the F0 founders are genetically mosaic, warranting a complex and time-consuming series of outcrossing to establish molecularly defined lines before any biological questions can be addressed [5,6]. In contrast to NHEJ, the MMEJ (Microhomology-mediated End Joining) DNA repair pathway utilizes a pair of locally available direct sequence repeats on both sides of a DSB that are apposed, annealed and extended [7][8][9][10]. As such, DSB repair outcomes are highly stereotyped (Fig 1), resulting in deletion of the intervening sequence as well as one of the repeats. Consequentially, there is an increasing interest in utilizing MMEJ for precision genome engineering applications [11][12][13][14]. To date, however, effective harnessing of this pathway remains challenging due to the paucity of genetic and mechanistic understanding [8].
Bae, et al. [14] developed a sequence-based scoring system to estimate the frequency of MMEJ-associated deletions induced by DSBs in human cells. While this improved the predictability of MMEJ activation, the DSB repair outcomes tended to consist of a heterogeneous population of multiple MMEJ alleles. In this study, we sought to improve upon the existing algorithm with the goal of developing tools to more reliably predict target loci that would be predisposed to generate a more homogeneous mutant allele population through MMEJ. We demonstrate the feasibility and utility of such reagent design on the molecular level (i.e., DNA repair outcomes) and on the physiological level (i.e., F0 phenotype). We further demonstrate that our approach can be applied to generating highly homogeneous MMEJ alleles in cultured MMEJ is a unique DSB repair pathway that results in highly efficient and highly stereotyped mutagenesis. DSB by conventionally designed Programmable Nucleases typically proceeds through a versatile yet unpredictable classical non-homologous end joining (NHEJ) pathway. As a result, a rather diverse cohort of mutant alleles are generated, making the subsequent selection process labor intensive to enrich for the allele of interest. The resulting genetic composition of the specific loci are often complex, requiring careful molecular characterization of each allele. Efficient activation of microhomology-mediated end joining (MMEJ) pathway, on the other hand, can greatly limit allelic diversity and enable the intentional generation of a particular deletion allele of interest at a high rate. Consequently, the downstream applications become more streamlined with facile generation of homozygous frameshift allele in diploid cells.
https://doi.org/10.1371/journal.pgen.1007652.g001 human cells, suggesting our findings may be broadly translatable to multiple model systems. We believe our approach can inform and benefit applications such as rapid phenotype-genotype correlation in F0 animals, with an eye toward applications in human gene therapy and facilitation of resource sharing & recreation of various cell and animal lines on a global scale.

MMEJ is an active repair pathway in the genetically unaltered zebrafish embryo
Prior works examining MMEJ activation in vertebrate organisms primarily focused on in vitro models [8][9][10][14][15][16][17][18]. Initial analyses using a targeted knockin strategy suggested that MMEJ was operational in the zebrafish embryo, though the efficiency of these MMEJ outcomes was rather modest [13]. Importantly, while previous studies reported incidental identification of several zebrafish genomic loci that repaired preferentially through MMEJ when using programmable nucleases [19,20], no consortium-small or large-of genomic loci that repair primarily through NHEJ vs MMEJ has been compiled. To this end, we examined the repair outcomes of previously designed TALEN and CRISPR-Cas9 genomic reagents (S1 Table). The plurality of custom enzymes induced diverse sequence outcomes, consistent with the idea that NHEJ is being used as the primary DNA repair pathway at these loci. However, a few reagents induced sequence outcomes satisfying the following criteria, suggesting that MMEJ was the preferred pathway: 1) most predominant mutant allele is the top predicted allele by the Bae, et al algorithm [14], 2) most predominant mutant allele comprises ! 50% of the total mutant allele population, and 3) mutagenic efficiency > 20%. For the purpose of this study, a programmable nuclease satisfying all these criteria is referred to as a Predominant MMEJ Allele (PreMA) reagent. Three sticky-end generating TALEN (chrd, mitfa #4 & surf1) and two bluntend generating CRISPR-Cas9 (surf1 & tyr #2) reagents fell into this category (S1 Table, Fig  2A, Fig 3A).
Injecting the chrd TALEN pair (37.5 pg/arm) resulted in characteristic chrd loss of function phenotypes: Intermediate-Cell-Mass expansion and a smaller head by 1 day post-fertilization [21] (1 dpf; Fig 2B). Median penetrance for Moderate and Severe phenotypes was 15.8% and 20.0%, respectively (Fig 2B, S2 Table). Strong MMEJ activation by this TALEN pair was confirmed by subcloning analysis (Fig 2A)-16/32 recovered mutant reads corresponded to the top predicted 7 bp deletion allele. Similarly, perturbing tyr gene with a CRISPR-Cas9 reagent recapitulated a previously reported, loss of melanin production phenotype, observable by 2 dpf [22] (Fig 3B). Ribonucleoprotein (RNP) delivery at the dose of 300 pg tyr #2 sgRNA and 660 pg Cas9 resulted in Moderate and Severe loss of pigmentation phenotypes in 22.7% and 50.0% of embryos respectively (Fig 3B, S2 Table). Subcloning analysis showed 21/24 (88%; Fig 3A) of resulting alleles contained a 4 bp deletion consistent with a strong MMEJ activation by this CRISPR-Cas9. Together with the chrd TALEN results, these data support that MMEJ can be an effective repair pathway in F0 embryos at some genomic loci, irrespective of programmable nucleases used.

Many Bae, et al. predicted MMEJ loci are preferentially repaired by NHEJ
A subset of these zebrafish reagents described above was prospectively designed using the Bae, et al. algorithm (S1 Table). This algorithm calculates the strength of each pair of microhomology arms (i.e., Pattern Score) according to the length and GC content of each pair, as well as the length of the intervening sequence. The additive sum of all the possible Pattern Scores is then returned as Microhomology Score. This latter score was found to have positive correlation PreMA TALEN reagent can be used to recapitulate previously reported loss-of-chrd-function phenotype in 1 dpf F0, injected larvae. A. Top-Wildtype chrd sequence with TALEN binding sites annotated in teal. The dotted red boxes are MH arms predicted to be used most frequently. Raw sequence alignment of the whole PCR amplicon demonstrates that the majority of reads are the expected 7 bp deletion allele. Bottom-summary data from subcloning analyses. 50% of the mutant allele recovered were of the predicted MH allele. B. Previously reported chrd loss-of-function phenotype was successfully recapitulated using this TALEN pair. Phenotype severity was graded by the degree of Intermediate-Cell-Mass expansion in the tail and by the reduced head size by 1 dpf. Box plot demonstrating phenotypic penetrance is provided with each experiment denoted by a unique marker shape. N = 3 biological and technical replicates. At least 29 injected animals were scored in each experiment.
https://doi.org/10.1371/journal.pgen.1007652.g002 PreMA sgRNA against tyr can be used to recapitulate loss-of-melanophore phenotype in 2 dpf, injected F0 larvae. A. Top-Wildtype tyr sequence with the #2 sgRNA target site annotated in green. The dotted red boxes are MH arms predicted to be used most frequently. Raw sequence alignment of the whole PCR amplicon demonstrates that the majority of reads are the expected 4 bp deletion allele. Bottom-summary data from subcloning analyses. 88% of the mutant allele recovered were of the predicted MH allele. B. Previously reported tyr loss-of-function phenotype was successfully recapitulated using this CRISPR-Cas9. Phenotype severity was graded by the loss of retinal pigmentation. Partial loss of retinal pigmentation was considered a Weak phenotype, whereas complete loss of pigmentation in one or both eyes were considered Moderate and Severe phenotypes, respectively. Box plot demonstrating phenotypic penetrance is provided with each experiment denoted by a unique marker shape. N = 3 biological and technical replicates. At least 12 injected animals were scored in each experiment. with the rate of MMEJ activation in HeLa cells [14]. All fourteen prospectively designed reagents had a Microhomology Score of at least 4000 -a median score found on human BRCA1 gene. However, only four of these reagents induced majority MMEJ outcomes as judged by the Microhomology Fraction (S1 Table, S1 Note). We therefore retrospectively analyzed the repair outcomes of these reagents to identify additional factor(s) that would enhance predictability of MMEJ induction.

Rate of Pattern Score change as a discrimination factor for MMEJ induction in vivo and in vitro
Intriguingly, when the pattern score values clustered closely to one another (i.e., a flatter Slope Value as calculated according to S2 Note), this was indicative of an unfavorable target for MMEJ activation in zebrafish embryos. Conversely, loci at which Pattern Scores dropped precipitously (i.e., a steeper Slope Value) were good candidates of MMEJ activation in vivo (p = 0.0048; S1 Fig). Based on these observations, we hypothesized that locally available microhomology pairs are in direct competition with one another such that overabundance of these pairs is a negative predictor of MMEJ activation. In other words, MMEJ activation is more favorable at loci with one or two predominant microhomology pair(s) (Low Competition loci) rather than many strong microhomology pairs (High Competition loci).
To determine whether the zebrafish-based hypothesis was generalizable to human cells (HeLa), we re-analyzed the deep sequencing dataset used to generate the Bae, et al. algorithm [14]. Available results from 90 genomic loci were sorted alphabetically by the names of target genes then divided into two groups: first 50 and the remaining 40. The first group was then used for a retrospective, correlative analysis while the latter was used for an analysis compatible with a prospective study design. Outcomes from the first 50 targets showed a correlation similar to that observed in zebrafish; higher Microhomology Fractions generally correlated with low

Competition hypothesis predicts new PreMA reagents
Based on this Competition Hypothesis, we designed 20 Low Competition sgRNA targets across 9 genes and analyzed the DSB repair outcomes (S3 Table). Slope Values smaller than -40 was used as the cut-off for Low Competition, as 3 out of 4 previously designed zebrafish targets produced majority MMEJ outcomes in this range (S1 Table and S1 Fig). For initial assessments, we used TIDE (Tracking Indels by DEcomposition) analysis-a chromatogram analyzing tool that estimates proportions of length varying mutant alleles present in a pool of mixed alleles [23]-which revealed that 5 of these sgRNAs against 3 genes (mtg1, tdgf1, ttn.2 #1, ttn.2 #2, and ttn.2 N2B #1) were in the PreMA class. These results were subsequently confirmed by subcloning analyses (S3 Table).
Perturbation of tdgf1 (alternatively known as One-eyed Pinhead) causes aberrant, "pinhead" morphology and cyclopia as judged by reduced forebrain protrusion by 1 dpf [24] (Fig 4B). RNP injections of CRISPR-Cas9 at the dose of 300 pg sgRNA and 660 pg Cas9 resulted in highly homogeneous DSB repair outcomes, generating the top-predicted 4bp allele in 28 of 39 clones analyzed (Fig 4A). Aberrant head morphology alone was classified as Weak whereas that in combination with varying degrees of forebrain protrusion was classified as Moderate or Severe phenotypes. Median penetrance for Moderate and Severe morphology was 21.8% and 11.4% (Fig 4B, S2 Table), consistent with the subcloning results.
We next explored whether these PreMA reagents are useful for recapitulating a more subtle phenotype beyond aberrant gross morphologies observed in the tdgf1 mutants. Splice blockade at the N2B exon of ttn.2 gene by a synthetic morpholino oligonucleotide was previously reported to reduce the cardiac contractility by~70% on 2 dpf [25], phenocopying the pickwick m171 mutation [26]. RNP delivery at the dose of 300 pg ttn.2 N2B #1 sgRNA + 660 pg Cas9 resulted in reduction of the shortening fraction to a comparable degree (Fig 5B). Importantly, RNP delivery of NHEJ-inducing ttn.2 N2B #2 sgRNA at the same dose only resulted in a more attenuated phenotype, despite it targeting the same exon and having comparable activity ( Fig  5; S4 Table). Due to the high editing efficiency, animals injected with these doses of ttn.2 N2B #1 RNP were not viable in post larval phases. For this reason, animals injected at the lower dose of 75 pg sgRNA + 165 pg Cas9 protein were raised to adulthood. Two F0 founders were successfully outcrossed to wildtype zebrafish. Heterozygous offspring were identified using the dsDNA heteroduplex-cleaving Surveyor assay [27], and the transmission of the top predicted 5 bp deletion allele was confirmed from both founders by subcloning analyses (S3 Fig).
We also designed an sgRNA against exon 13 of ttn.2 (ttn.2 #2 sgRNA), expected to produce a 12 bp deletion allele as a proof-of-principle for in-frame gene correction (Fig 6A). RNP delivery at the dose of 300 pg sgRNA + 660 pg Cas9 resulted in the induction of this 12 bp deletion allele in 72.7% of the clones. While the injected animals presented with mild cardiac edema evident by 2 dpf (median rate: 50.0%; Fig 6B, S2 Table), unlike the N2B #1 sgRNA CRISPR-Cas9 injected animals, these were viable to adult age. The dotted red boxes are MH arms predicted to be used most frequently. Raw sequence alignment of the whole PCR amplicon demonstrates that the majority of reads are the expected 4 bp deletion allele. Bottom-summary data from subcloning analyses. 72% of the mutant allele recovered were of the predicted MH allele. B. Previously reported tdgf1 loss-of-function phenotype was successfully recapitulated using this CRISPR-Cas9. Phenotype severity was graded by the "pinhead" morphology and cyclopia. Pinhead morphology alone was classified as Weak, whereas Moderate and Severe phenotypes also presented with varying degrees of cyclopia judged by the distance of forebrain protrusion. In the Severe class, the forebrain does not separate the eyes, and they are fused together. Box plot demonstrating phenotypic penetrance is provided with each experiment denoted by a unique marker shape. N = 4 with 3 biological and 4 technical replicates. At least 42 injected animals were scored in each experiment. https://doi.org/10.1371/journal.pgen.1007652.g004

Low competition plus proximity of microhomology arms strongly predicts PreMA reagents: V2
These data implicate the utility of PreMA reagents for various applications that require precision gene editing. However, sgRNA design based on the Competition Hypothesis only yielded 5 PreMA reagents out of 20 that were tested (S3 Table, S3 Note). While this represented an improvement over the initial approach solely relying on the Microhomology Score (1 out of 14; S1 Table), we sought to further fine-tune the predictability for the PreMA targets. To this end, we pooled the results from all the programmable nucleases described above (S1 and S3 Tables) and seven Medium~High Competition sgRNAs designed as controls based on the Competition Hypothesis (S4 Table). In so doing, we noted that PreMA outcomes were only observed if the two arms of the top predicted microhomology were separated by no more than 5 bp. We subsequently identified the second parameter: high ratio (! 1.5) of the Pattern Scores between the top predicted and second predicted MMEJ alleles for a given locus (Fig 7). Seven out of eight reagents that satisfied both of these parameters were PreMA. Of the nine reagents that satisfied the first parameter but not the second, two were PreMA. All the other thirty reagents that failed to meet the first parameter failed to induce the top predicted MMEJ allele strongly. Most importantly, all the failed cases (i.e., incorrect predictions according to the original The dotted red boxes are MH arms predicted to be used most frequently. Raw sequence alignment of the whole PCR amplicon demonstrates that the majority of reads are the expected 5 bp deletion allele. Bottom-summary data from subcloning analyses. 86% of the mutant allele recovered were of the predicted MH allele. B. Previously reported pickwick phenotype was successfully recapitulated using this CRISPR-Cas9. 2 dpf zebrafish were immobilized in 3% methylcellulose for live recording of cardiac functions. Whereas injections with Cas9 only (660 pg), N2B #1 sgRNA only (300 pg), or tyr #2 sgRNA RNP (300 pg sgRNA + 660 pg Cas9) did not result in changes in shortening fraction at this age, MMEJ-inducing RNP injection targeting N2B #1 (300 pg sgRNA + 660 pg Cas9) resulted in a specific reduction in shortening fraction by 78.4%. In contrast, NHEJ-inducing RNP injection targeting N2B #2 (300 pg sgRNA + 660 pg Cas9) resulted in attenuated effects on shortening fraction (53.3% reduction), despite similarly high edit efficiency. Each data point represents an individual animal scored with the shape of the marker denoting unique experiment. N ! 3 biological and technical replicates, except for N2B #2 where N = 2. At least 5 injected animals were scored in each experiment. P-values calculated by Wilcoxon's Each Pair Calculation (adjusted for multiple comparisons). Competition Hypothesis) can be explained using our revised approach (Competition Hypothesis V2; Fig 7C). The Version 2 also captured three PreMA reagents that would have been missed by the original Competition Hypothesis alone, and one PreMA reagent that would have been missed by the Microhomology Score alone. Similar trends were observed using independently collected, previously published deep sequencing dataset from zebrafish [28] and HeLa cells [14] (S4 Fig).

Mechanism of MMEJ-activation may be conserved in vertebrates
To test the generalizability of our findings, we prospectively designed 11 sgRNAs against the human genome (S5 Table) and delivered as RNPs to HEK293T cells. Of the 5 active guides cutting above 20% efficiency, DSBs induced by GJB2 #1 and #2 guides resulted in more homogeneous repair outcomes (Fig 8A and 8B) than any of the 92 guides tested by Bae, et al (S4B Fig) [14]. DSBs at AAVS1 #2 and MYO7A #3, on the other hand, repaired primarily through 1bp indels, consistent with the report by Bae, et al using HeLa cells. Intriguingly, the second most prevalent class of repair at these loci was the top predicted MMEJ allele (Fig 8C and 8D), as identified by subcloning analyses. We thus conclude that the specific trigger for efficient MMEJ-activation may be conserved in vertebrate organisms, albeit with nuances that are yet to be elucidated.

Accessing the PreMA algorithm through MENTHU (MMEJ kNockout target heuristic utility)
The broad potential utility of this updated PreMA Algorithm for MMEJ prediction led us to develop a web-based automated analysis tool called MENTHU (http://genesculpt.org/menthu/ ). The tool can also be downloaded and installed on a local computer (www.github.com/ Dobbs-Lab/menthu/). MENTHU accepts a user-specified DNA sequence and targeting scheme as input, and outputs recommended CRISPR gRNA target sites that are predicted to result in PreMA type outcomes. We validated the accuracy and functionality of MENTHU against select gRNA sites used in this study using whole exonic sequences as inputs (S6 Table); importantly, the software identified novel PreMA candidate loci against surf1 and tdgf1 where only Group 3 gRNA loci had been found by previous methods. Finally, we conducted a preliminary assessment to examine the prevalence of PreMA loci and found roughly 10% prevalence of such loci among all possible NGG PAM on human CSF2 as well as zebrafish tp53 genes (S7 Table).

Discussion
To date, precision genome engineering is limited by the ability to predictably, efficiently, and reproducibly induce the identical sequence alterations in each and every cell. Here, we demonstrate the feasibility and utility of creating allelic consistency by an MMEJ-centric approach for designing programmable nucleases. While the precise cellular components of the molecular machinery involved in MMEJ remain incompletely understood [8], we provide evidence that we can enrich for MMEJ events by strictly sequence-based queries.
We also demonstrate that MMEJ predominant repairs do not operate at the cost of overall mutagenic efficiency; median edit efficiency for PreMA reagents was 91.4% in zebrafish. As genetically unaltered wildtype zebrafish were used throughout the study, we have no reason to believe that NHEJ should have failed at any tested loci. This is in contrast to the proposal that MMEJ is a back-up pathway to NHEJ [7,8,16,17,29]. Our findings, on the other hand, are compatible with a previous report wherein MMEJ-specific factors such as PolQ are abundantly expressed in embryonic zebrafish [20]. Interestingly, maternally zygotic PolQ mutant embryos failed to repair DSB at two out of three MMEJ loci, leading to premature deaths [20]. The third locus-which preferentially used a 2 bp microhomology and exhibited more heterogeneous DSB repair outcomes-was able to be repaired at a measurable rate, though significantly less so than in WT embryos. Thus NHEJ and MMEJ may be non-competing, parallel processes with unique triggers.
Based on the data presented here, we speculate that there is a reaction-limiting factor for MMEJ that is involved in identifying compatible microhomology pairs on both sides of the DNA double stranded break. In the case of abundantly available local microhomology pairs, this factor may fail to localize to a single suitable pair, thus rejecting the MMEJ activation. As end-resection is required for MMEJ and not for NHEJ [9,17,18], this yet identified factor may be the deciding factor for committing DSB repair through one End Joining pathway to another. This view is similar to a recent report wherein CtIP/Artemis dependent limited end resection was a key trigger for a slow-kinetic Lig1/3 independent NHEJ event that frequently utilized Microhomology to repair a reporter plasmid [30]. In our analysis, the primary driver of this decision making process is the proximity of 2 microhomology arms, further aided by the lack of competing microhomology arms.
Successful deployment of the PreMA reagents makes it possible to directly dictate the reading frame or to do in-frame gene manipulations on endogenous targets. Even assuming a somewhat modest outcome of 50% edit efficiency in which 50% of the mutant allele pool is of the desirable allele, more than 10% of the cell population will be homozygous for this desired allele. Conversely, many real-life gene editing applications would require only one of the diploid copies to be corrected. In these settings under the same assumptions, just 11 viable cells are needed to achieve 95% confidence for establishing the right clone, bringing the idea of precision molecular surgery closer to reality.
Our present study expands upon the current state-of-art understanding for MMEJ activation and demonstrates the ability to prospectively design robustly active PreMA reagents invivo. We also provide evidence that this 2-component approach may be broadly applicable beyond zebrafish; testing of the true generalizability of our approach will be facilitated by our web-based application, MENTHU (http://genesculpt.org/menthu/). Importantly, MENTHU allows users to flexibly define a PAM sequence and the cut site (in nts from PAM) so as to accommodate potential future variants of the CRISPR system. Active investigations are underway to accommodate alternative or more lax PAM requirements, such as the case with xCas9a recently described variant of Cas9 that may function efficiently on an NG PAM [31]. As MMEJ-based loci are inherently restricted to genomic locations that leverage endogenous sequence contexts, availability of more flexible programmable nucleases will become the key for broadening the utility of PreMA reagents.
We provide strong evidence to support the utility of the MMEJ-centric approach beyond phenotype-genotype correlations in F0 animals. We envision this approach to be useful for: 1) studying the effects of homozygous gene knock-out in culture cells (as opposed to more common, compound heterozygous loss-of-function cell lines), 2) rapid small molecule screening in F0 animals as a complimentary approach to studying in germline mutant animals, 3) globally sharing and reproducing gene knock-out cell and animal lines, 4) pathway dissection for MMEJ, and finally, 5) human gene therapy.

Ethics statement
The animal studies were conducted following guidelines and standard procedures established by the Mayo Clinic Institutional Animal Care and Use Committee (Mayo IACUC). The Mayo IACUC approved all protocols involving live vertebrate animals (A23107, A 21710 and A34513).

Microhomology arms
For the purpose of this study, microhomology is defined as any endogenous direct sequence repeats of ! 3 bp surrounding a DSB site. 2 bp direct sequence repeats were not considered sufficient substrates of MMEJ activation based on our initial analyses of the DSB repair outcomes by previously designed programmable nucleases. Correlation for Microhomology Fraction vs the Slope Value was tangentially stronger when only ! 3 bp arms were considered (r 2 =

Zebrafish husbandry
All zebrafish (Danio rerio) were maintained in accordance with protocols approved by the Institutional Animal Care and Use Committee at Mayo Clinic. Zebrafish pairwise breeding was set up one day before microinjections and dividers were removed the following morning. Following microinjections, the fertilized eggs were transferred to Petri dishes with E3 media [5 mM NaCl, 0.17 mM KCl, 0.33 mM CaCl 2 , and 0.33 mM MgSO 4 at pH 7.4] and incubated at 28.5˚C. All subsequent assays were conducted on fish less than 3 dpf, with the exception of assessing for germline transmission. In this case, injected founders were raised to adulthood per the standard zebrafish husbandry protocol.

DNA oligonucleotide preparation
All of the oligonucleotides used for this study were purchased from IDT (San Jose, CA). Upon arrival, they were reconstituted into 100μM suspensions in 1x TE and stored at -20˚C until use.

sgRNA expression vector synthesis
pT7-gRNA was a gift from Wenbiao Chen (Addgene plasmid # 46759). Given that the minimum requirement for the T7 promoter is a single 5' G, the GG start on this vector was mutagenized via site-directed mutagenesis (SDM) to accommodate GA, GC, GT starts, using Forward and Reverse primers given (S8 Table). Platinum Pfx DNA Polymerase (Invitrogen 11708013. Carlsbad, CA) was used for 20 cycles of PCR amplification with the Tm of 60˚C and extension time of 3 minutes. DpnI (NEB R0176. Ipswich, MA) was subsequently added to reaction prior to transforming DH5α cells. The target sequence was cloned in as previously described, with the exception of conducting oligo annealing and T4 ligation (NEB M0202. Ipswich, MA) in 2 separate steps. In each case, transformed cells were cultured with Carbenicillin, and plasmids were purified with Plasmid Mini Kit (Qiagen 12123. Hilden, Germany).

TALEN synthesis
TALEN constructs were generated using the FusX kit (Addgene # 1000000063) as previously described [32]. In short, RCIscript-GoldyTALEN was linearized with BsmBI (NEB R0580. Ipswich, MA) along with 6 triplet RVD (Repeat-Variable Diresidue) plasmids. Subsequently, they were ligated together in one reaction by a modified Golden-Gate Assembly. Blue-White colony screening with X-Gal/IPTG, colony PCR and finally pDNA sequencing were done to ascertain the correct assembly.

In-vitro transcription and RNA preparation
pT3TS-nCas9n (a gift from Wenbiao Chen: Addgene plasmid # 46757) was linearized with XbaI (NEB R0145. Ipswich, MA), whereas TALEN constructs were linearized with SacI-HF (NEB R3156. Ipswich, MA) and sgRNA vector with BamHI-HF (NEB R3136. Ipswich, MA). Tyr sgRNA #2 -a construct made in the Essner Lab-was linearized with HindIII (NEB R0104. Ipswich, MA). RNA was made using T3 mMessage mMachine kit (Ambion AM1348. Foster City, CA) or HiScribe T7 High Yield RNA synthesis kit (NEB E2040. Ipswich, MA) according to manufacturer's protocols with the addition of RNA Secure to the reaction (Ambion AM7010. Foster City, CA). To purify RNA, phenol-chloroform extraction was performed using Acid Phenol, Chloroform, and MaXtract High Density Tubes (Qiagen 129046. Hilden, Germany). RNA was then precipitated with Isopropanol at -20˚C, pelleted, air dried and resuspended into nuclease free water. The quality and quantity of RNA were ascertained by using a Nanodrop spectrophotometer and running aliquot on agarose gel. Each batch of RNA was aliquoted into small single use tubes and stored at -80˚C until the morning of microinjections.

CRISPR-Cas9 RNP preparation for microinjections
sgRNA was thawed on ice in the morning of microinjections. This was then diluted to the concentration of 300 ng/μL in Duplex Buffer [100 mM KCH 3

Microinjections
Microinjections were carried out as previously described [33]. In short, 1-cell stage fertilized embryos were harvested and aligned on an agarose plate with E3 media. In the case of CRISPR-Cas9 reagents, either 1 or 2 nL was delivered to the cell. In the case of TALEN reagents, 1~3 nL was delivered to the yolk mass. They were then transferred to Petri dishes in E3 media for incubation at 28.5˚C. Dead and/or nonviable embryos were counted and removed each subsequent morning.

Phenotype scoring
Each experiment was conducted in at least a technical triplicate and a biological duplicate. Detailed outcomes are provided in S4 Table. Gross phenotypes were scored visually on either 1 dpf or 2 dpf using a standard dissecting microscope. Subsequently, representative pictures were taken with Lightsheet Z.1 (Zeiss 2583000135. Oberkochen, Germany). Shortening Fractions were scored as previously reported [34]. In short, live 2 dpf larvae were immobilized and positioned in 3% methylcellulose. An Amscope camera (MU1403. Irvine, CA) mounted on a Leica Microscope (M165. Wetzlar, Germany) was used to capture a 15 second clip of the beating heart at 66 fps. These clips were subsequently used to measure the distance of the long axis along the ventricle at maximum dilation and maximum contraction using ImageJ software [35]. Shortening Fraction was calculated as below:

Distance at Maximum Shortening Distance at Maximum Dilation
Shortening Fractions from 5 cycles were averaged for each animal.

Zebrafish DNA extraction and assessing mutagenic outcomes
Typically, 8 uninjected wildtype fish and 8 injected fish were randomly collected without prior screening for phenotype. Chorion was predigested with 1 mg/mL Pronase at room temperature as needed. 1~3 dpf animals were then sacrificed for individual DNA extractions in 100 mM NaOH for 15 minutes at 95˚C. Equal volumes of 8 fish DNA from the same condition were then mixed and used as templates for PCR with either MyTaq (Bioline BIO-21108. London, UK), Phusion (NEB M0530. Ipswich, MA), or KOD (EMD Millipore 71085. Burlington, MA) polymerases per manufacturer's protocols. The PCR amplicon was resolved on agarose gel, gel extracted with either Monarch DNA Gel Extraction Kit (NEB T1020. Ipswich, MA) or QiaEx II Gel Extraction Kit (Qiagen 20021. Hilden, Germany), and subsequently sent out for sequencing. The chromatograms from both uninjected and injected amplicons were used for TIDE analysis [23]. Alternatively, purified amplicons were used for subcloning analysis with either Topo-TA Cloning Kit (Thermo Fisher Scientific 451641. Waltham, MA) or StrataClone PCR Cloning Kit (Agilent 240205. Santa Clara, CA) per manufacturer's protocols. Resultant white to pale blue colonies by Blue-White screening were subjected to colony PCR with M13F and R primers, using MyTaq polymerase. Once successful amplification was confirmed on agarose gel, these amplicons were sent out for sequencing either with M13F, M13R or endogenous gene target primers.

Germline transmission for 5 bp deletion generated by N2B sgRNA #1
RNP containing N2B sgRNA #1 was prepared at 4x diluted dose as described above. Following microinjections, viable fish were raised to sexual maturity. Both F0 founders we attempted to out cross successfully mated and produced viable embryos. DNA was extracted from all viable embryos on 1 dpf, and individual DNA was used as template for PCR amplification using MyTaq Polymerase. Once the thermocycling ran to completion, the amplicons were melted by heating to 95˚C and re-annealed by a gradual step-wise cooling. Surveyor assay [27] was conducted per the manufacturer's protocol (IDT 706025. San Jose, CA), and the results were analyzed by resolving the post-digest amplicons on agarose gel. Amplicons from 4 heterozygous offspring each were subcloned, and 5 colonies each were sent for Sanger Sequencing to confirm successful transmission of the 5 bp deletion allele.

Reanalyses of previously published deep sequencing dataset
For zebrafish dataset, sgRNA screen SRA files were obtained from NCBI's Short Read Archive (Accession: PRJNA245510) [28]. These files were converted to the fastq format with fastqdump command using-split-spot function under SRA Toolkit (NCBI. Bethesda, MD). The fastq files were then uploaded onto Cas-Analyzer (http://www.rgenome.net/cas-analyzer/) and analyzed with Comparison range of 25~40 and Minimum frequency of 1 [36]. Following number of reads were recorded: total, total mutant, total top predicted allele. A top predicted allele was allowed to be included so long as the read contained no more than 2 polymorphisms on the analysis window AND the polymorphisms did not fall on the microhomology arms. Subsequently, the calculated mutagenic efficiency was plotted against the reported efficiency (r 2 = 0.306). Of 122 targets designed by Gangnon, et al, following were excluded to arrive to the 34 targets that were used for analysis presented in S4 Fig Panel A: non-NGG targets (36 loci), targets that did not align to WT consensus sequence (GRCz11; 8 loci), targets with total recovered read counts less than 1% of expected (7 loci), high rate of permutation outside of the target site (1 locus), targets that did not have good agreements between calculated and reported (i.e., fell beyond 99% Confidence Interval; 10 loci), targets that had less than 5% calculated AND reported mutagenic efficiencies (26 loci).
The HeLa cell dataset [14] was obtained from Dr. Kim in the form of excel spread sheet with aligned sequence outputs +/-25 bp of the predicted cut site. Following number of reads were recorded: total, total mutant, total top predicted allele with 2 bp microhomology, and total top predicted allele with 3 bp or longer microhomology. As with zebrafish dataset, top predicted allele was allowed to be included so long as the read contained no more than 2 polymorphisms on the analysis window AND the polymorphisms did not fall on the microhomology arms. Of the 92 targets, following were removed to arrive to 74 targets that were used for analyses presented in S2 Fig and S4 Fig Panel B: targets with total recovered read counts less than 1% of expected (2 loci), and targets that had less than 20% mutagenic efficiency (16 loci). There were no targets with non-NGG PAM, no alignment against consensus sequence, nor a high rate of permutation outside of the predicted cut site.

HEK293T cell DNA extraction and assessing mutagenic outcomes
HEK293T cells were harvested 24 hour post transfection for gDNA extraction using DNeasy Blood & Tissue Kit (Qiagen 69506. Hilden, Germany). 20 ng of gDNA was used as a template for PCR with KOD polymerase per manufacturer's protocol. The PCR amplicon was resolved on agarose gel, gel extracted with Monarch DNA Gel Extraction Kit and subsequently sent out for sequencing. The chromatograms from both uninjected and injected amplicons were used for TIDE analysis [23].
For sgRNAs that showed > 20% activity by TIDE, single A overhang was added to the 3' end of purified amplicons by incubating them with MyTaq polymerase at 72˚C for 15 minutes. They were then used for subcloning analysis with StrataClone PCR Cloning Kit. 96 resultant white to pale blue colonies by Blue-White screening were subjected to colony PCR with endogenous primers using MyTaq polymerase. Once successful amplification was confirmed on agarose gel, these amplicons were subjected to T7E1 assay [27]. Briefly, 2.5 μL each of colony PCR amplicon and wildtype amplicon were heteroduplexed in 1x NEB 2.0 Buffer (25 μL). This was incubated for 15 minutes at 37˚C with 0.5 μL T7 Endonuclease I (NEB m3020. Ipswich, MA) and 4.5 μL dH2O. The digested amplicon was resolved on 2% agarose gel. Number of colony PCR-positive clones and digest positive clones are reported in S5 Table. Some of the digest positive clones were then sent for sequencing to ascertain the nature of mutation.
Summary outcomes for Top MH Fraction calculation based on estimated alleleic prevalence is given in S5 Table. MENTHU We developed a software tool, MENTHU (MMEJ kNockout Target Heuristic Utility), to automate calculations required to implement the 2-component PreMA strategy: 1) identification of top predicted microhomology arms separated by 5 bp of intervening sequence, and 2) identification of "low competition" target sites (i.e., with a #1-ranked to #2-ranked Pattern Score ratio ! 1.5). We designed MENTHU to first compute two of same sequence-based parameters (Pattern Score and Microhomology Score) used in the algorithm of Bae, et al., To preliminarily assess the abundance of PreMA loci, MENTHU was locally run to screen the sequences of two genes: human colony stimulating factor 2 (CSF2; Gene ID-1437) and zebrafish tumor protein p53 (tp53; Gene ID-30590). MENTHU was run twice on each gene: exonic target screen and whole gene target screen. A custom R script was used to mine the MENTHU results in a .csv format to determine both the amounts of total targetable sites by spCas9 (i.e., total number of unique cut sites with NGG PAM on either strand) and the subset of those predicted to be PreMA.

Statistical analyses
All of the statistical analyses were carried out using JMP software (SAS Institute. Cary, NC). In all instances, p-values were calculated assuming non-Gaussian Distributions. Wilcoxon Each Pair calculation was used for multiple group comparisons with adjusted p-values. All three Group 4 targets as well as some Group 3 targets yielded PreMA outcomes, validating our own training dataset. Importantly, none of Groups 1 and 2 targets were of this class. B Outlier plot summarizing repair outcomes from 90 genomic targets using CRISPR-Cas9. Similar to the findings in zebrafish, close proximity of the top predicted MH arms (Groups 3 and 4) appears to be the primary determinant for utilizing this MH pair efficiently. When the top predicted allele had at least 50% higher Pattern Score than the second predicted allele (Groups 2 and 4), median Top MH Fractions trended higher compared to Group 1 and 3, respectively. P-values calculated by Wilcoxon's Each Pair Calculation (adjusted for multiple comparisons). Targets with < 20% overall edit efficiency were excluded from analysis. Pattern Scores were derived using RGEN online tool (http://www.rgenome.net). (TIF) S1 Table. List and summary mutagenic outcomes of TALEN and CRISPR-Cas9 reagents that were designed primarily using the Bae, et al algorithm [14]. Underlined & italicized bases in sgRNA sequence denote mismatched bases due to the promoter requirement. Pattern Scores and Microhomology Scores were derived using RGEN online tool (http://www.rgenome. net). MH: Microhomology, SC: Subcloning. Ã Reagents prospectively designed according to Bae, et al algorithm [14]. † No raw sequencing data were available. However, the outcome had been compiled into a table prior to conception of this study. ‡ Injected with sgRNA and Cas9 mRNA (150 pg and 100 pg, respectively).^Gift from Wenbiao Chen (addene # 46761). (TIF) S2  Table. Example MENTHU output from select CRISPR-Cas9 targets used in this study, focusing only on out-of-frame mutations. The output was obtained by using the entire target exon sequence with 40 bp intronic sequence each on both 5' and 3' ends. The MENTHU output provides a 3' NGG PAM sequence for each gRNA targets (italicized and underlined). MENTHU gRNA outputs that matched the target sequences used in this study are bolded. Criteria 1 and 2 refer to 1) if top predicted microhomology arm is separated by 5 bp or less, and 2) if the ratio of top to second predicted Pattern Scores is at least 1.5. MENTHU is programmed to terminate calculations if the target site is negative for Criterion 1. As a result, no gRNA sequence output is obtained for chrd #1 and mitfa #2. Importantly, in two instances (surf1 and tgdf1) where we only had Group 3 reagents, novel candidate PreMA sites were identified. Ã Result obtained by adjusting the value for Criterion 2 to 1.0 as these were Group 3 guides that, by definition, does not satisfy Criterion 2 of 1.5 or higher.^in-frame mutation by the experimental design. † 16 other candidate loci identified on this 3771 bp exon; only a partial list for alternate loci is given. ‡ 16 other candidate loci identified on this 822 bp exon; only a partial list for alternate loci is given. (TIF) S7 Table. Preliminary analyses on the prevalence of PreMA loci reveal that about 10% of the CRISPR-Cas9 targetable loci on both human CSF2 and zebrafish tp53 genes fall in this category. This holds true for both at the gene and exonic levels. As expected, roughly two thirds of the PreMA reagents are predicted to induce preferentially out-of-frame mutations. (TIF) S8 Table. List of primers used in this study. All the primer sequences are provided in 5' -> 3' order. For urod Reverse primer, M13F primer sequence was added at the 5' end of the endogenous target sequence (bolded and italicized). For SDM primers, intended point mutation is indicated by bold and italic. Ã No endogenous primer was used to sequence the genomic loci of interest. (DOCX) S1 Data. Sanger sequencing file used for the study. Whole amplicon sequencing outcomes are deposited as .ab1 files for chrd TALEN and tyr #2, tdgf1, ttn.2 N2B #1, ttn.2 #2 sgRNA targets. Other sequencing outcomes, including those used for subcloning analyses are provided in the .fastq formats. (ZIP)