Optimization of T-DNA architecture for Cas9-mediated mutagenesis in Arabidopsis

Bacterial CRISPR systems have been widely adopted to create operator-specified site-specific nucleases. Such nuclease action commonly results in loss-of-function alleles, facilitating functional analysis of genes and gene families We conducted a systematic comparison of components and T-DNA architectures for CRISPR-mediated gene editing in Arabidopsis, testing multiple promoters, terminators, sgRNA backbones and Cas9 alleles. We identified a T-DNA architecture that usually results in stable (i.e. homozygous) mutations in the first generation after transformation. Notably, the transcription of sgRNA and Cas9 in head-to-head divergent orientation usually resulted in highly active lines. Our Arabidopsis data may prove useful for optimization of CRISPR methods in other plants.


Introduction
CRISPR (clustered regularly interspaced short palindromic repeat)-Cas (CRISPR associated) site-specific nucleases evolved as components of prokaryotic immunity against viruses, and are widely deployed as tools to impose operator-specified nucleotide sequence changes in genomes of interest [1][2][3][4]. During infection by bacteriophages, Cas1 and Cas2 can integrate phage DNA sequences into 'spacer' regions of tandem CRISPR loci in the bacterial genome. The crRNA (CRISPR-RNA) transcription product of the spacer associates with nucleases from the Cas family to form ribonucleoproteins that can cleave nucleic acid sequences homologous to the spacer. This enables elimination of viral nucleic acid upon subsequent infection. CRISPR systems are divided in two classes [5,6]. Class 1 systems comprise multi-subunit complexes whereas Class 2 systems function with single ribonucleoproteins. Within Class 2, Type-II and Type-V cleave dsDNA (double-stranded DNA) via Cas9 and Cas12/Cpf1 respectively, while Type-VI cleaves ssRNA (single-stranded RNA) via Cas13/C2c2.

Golden Gate cloning enables facile assembly of diverse Cas9 T-DNA architectures
In Golden Gate modular cloning, the promoter, reading frame and 3' end modules at 'Level 0', are assembled using Type IIS restriction enzymes to 'Level 1' complete genes, that can then be easily combined into T-DNAs carrying multiple genes at 'Level 2'. This enables facile assembly of diverse T-DNA conformations [22,23]. Level 0 acceptor vectors are designed to clone promoter, coding sequence (CDS) or terminator fragments (see Materials and methods). For our purpose, we used three Level 1 vectors: a glufosinate plant selectable marker in position 1 (pICSL11017, cloned into pICH47732), a Cas9 expression cassette in position 2 (cloned into pICH47742) and a sgRNA expression cassette in position 3 (cloned into pICH47751) (Fig 1). Some Cas9 expression cassettes were cloned into a Level 1 position 2 variant: pICH47811. This vector can be assembled in Level 2 in the same fashion as pICH47742, but it enables Cas9 transcription in the opposite direction as compared to the other Level 1 modules. We assembled 25 different Level 1 Cas9 constructs and four sgRNA expression cassettes. The sequence targeted by the sgRNA was CGTATCTTCGGCCATGAAGC (NGG) (Protospacer Adjacent Motif indicated in italics) which targets specifically ADH1 in Col-0, enabling pre-selection of CRISPR-induced adh1 mutants by selecting with allyl alcohol [13]. Assembly of these Level 1 modules resulted in 39 Level 2 T-DNA vectors (S1 Table). More details of the assembly protocols can be found in the 'Materials and Methods' section.

CRISPR-induced Arabidopsis mutations can be selected using allyl-alcohol
The 39 Level 2 plasmids were transformed in A. tumefaciens strain GV3101 and used to generate Arabidopsis Col-0 transgenic lines. 'T1' refers to independent primary transformants selected from the seeds of the dipped plant; 'T2' refers to the T1 progeny. For each of the 39 constructs, about 100 T2 progenies from six independent T1 lines were screened for allyl alcohol resistance (Fig 2). T2 seeds were selected with 30 mM allyl-alcohol for two hours. Six survivors (or all survivors if there were less than six) were screened by PCR amplification and capillary sequencing to confirm the mutation in ADH1 at the expected target site. This genotyping step enabled us to estimate the percentage of non-mutated plants that escape the allylalcohol selection. We indeed identified some lines surviving the allyl-alcohol screen that are heterozygous (ADH1/adh1). CRISPR activity is expressed as [(number of allyl-alcohol surviving plants) x (% of homozygous or biallelic mutants confirmed by sequencing among the surviving plants tested) / (number of seeds sown)]. It was measured for six independent T2 families, for each of 39 constructs. When more than 75% of the lines survived the allyl-alcohol treatment and all the lines genotyped are knock-out (KO) alleles with the exact same mutation within one T2 family, we assumed that the T1 parent was a homozygous mutant. Such T2 families are indicated in red. in most tissues [24]. We compared the 35S and Arabidopsis UBI10 promoters. More mutants were recovered using the UBI10 promoter, suggesting it is more active than 35S in the germline ( Fig 3A). Following this observation, we tested other germline-expressed promoters.

UBI10, YAO and
In the combinations we tested, we detected low CRISPR activity using the meiosis I-specific promoter MGE1 [26] (Fig 3C), the homeotic gene promoter AG [27] (Fig 3D) and the DNA polymerase subunit-encoding gene promoter ICU2 [28] (Fig 3D). They were tested with constructs inducing an overall low activity and we do not exclude that they can perform efficiently in other conditions. In one context specifically, ICU2 promoter resulted in moderate activity in four of the six T2 families tested, while only one T2 family showed activity with the UBI10 promoter ( Fig 3E).
EC1.2 and an EC1.2::EC1.1 fusion (referred as 'EC enhanced' or 'ECenh') are specifically expressed in the egg cell and were reported to trigger elevated mutation rates with CRISPR in Arabidopsis [17]. In our Golden Gate compatible system, only ECenh induced homozygous mutants in T1 and at low frequency (Fig 3B and 3G). In one comparison, EC1.2 and ECenh performed slightly better than pUBI10 (Fig 3D), but in another, they induced lower activity ( Fig 3E).
A promoter from Cassava Vein Mosaic Virus (CsVMV) was reported to mediate CRISPR activity in Brassica oleracea [29]. We found that it induced more CRISPR activity than pUBI10 in two combinations tested (Fig 3D and 3E).
We also tested the YAO and RPS5a promoters. Both of them were reported to boost CRISPR activity in Arabidopsis [15,16]. Both triggered elevated mutation rates compared with the UBI10 promoter ( Fig 3F). In one comparison, pRPS5a performed slightly better ( Fig 3G), but in another, pYAO performed better ( Fig 3H).

Codon optimization of Cas9 and presence of an intron elevate mutation rates
The activity of different constructs with the same promoter can be very different. For instance, pRPS5a:Ca9 and pYAO:Ca9 lines were recovered that displayed either high or low activity ( Fig  3F and 3H). The most active constructs carried Cas9_3 or Cas9_4 alleles. We thus compared four Cas9 alleles side-by-side (Fig 4). Cas9_1 is a human codon-optimized version with a single C-terminal Nuclear Localization Signal (NLS) [3]. Cas9_2 is an Arabidopsis codon-optimized version with a single C-terminal NLS [13]. Cas9_3 is a plant codon-optimized version with both N-and C-terminal NLSs, an N-terminal FLAG tag and a potato intron IV [25]. Cas9_4 is a human codon-optimized version with both N-and C-terminal NLSs and an N-terminal FLAG tag [10].
We found that in comparable constructs, Cas9_2 performs better than Cas9_1 (Fig 4E to  4H), consistent with the fact that Cas9_2 was designed for Arabidopsis codon usage. However, human codon-optimized Cas9_4 induced more mutants than Arabidopsis optimized Cas9_2 in one experiment ( Fig 4B). Cas9_4 has an extra N-terminal NLS compared to Cas9_2, which may explain this difference. In this comparison specifically, Cas9_3 was less efficient than Cas9_4. However, by comparing Cas9_3 and Cas9_4 in combination with YAO or RPS5a promoters, we found that Cas9_3 resulted in high mutation rates (Fig 4C and 4D). Cas9_3 efficiency can be explained by the plant codon optimization, the presence of two NLSs and the inclusion of a plant intron. This intron was originally added to avoid expression in bacteria during cloning and, as side effect, can also increase expression in planta [30]. We recommend the use of Cas9_3 for gene editing in Arabidopsis.

A modified sgRNA triggers CRISPR-induced mutations more efficiently
In the endogenous CRISPR immune system, Cas9 binds a CRISPR RNA (crRNA) and a transacting CRISPR RNA (tracrRNA) [31]. A fusion of both, called single guide RNA (sgRNA), is sufficient for CRISPR-mediated genome editing [32]. sgRNA stability was suggested to be a limiting factor in CRISPR system [33]. Chen et al. proposed an improved sgRNA to tackle this issue [8]. It carries an A-T transversion to remove a TTTT potential termination signal, and an extended Cas9-binding hairpin structure ( Fig 5A). We compared side-by-side the 'Extended' and 'Flipped' sgRNA (sgRNA EF ) with the classic sgRNA (Fig 5B and 5C). In two independent  67 . c. and d. Five lines were tested for Cas9_3 instead of six. The sgRNA targets ADH1. CRISPR activity measured in % of homozygous or biallelic stable mutants in the second generation after transformation (T2). Each dot represents an independent T2 family. Red dot: All the T2 lines from this family carry the same mutation, indicating a mutation more likely inherited from the T1 parent rather than being de novo from the T2 line. Bold and underlined: Most active construct(s) for each panel.   [13]. OcsT: 714 bp of the Agrobacterium tumefaciens octopine synthase terminator. U6-26p: 205 bp of the comparisons, the efficiency was higher with sgRNA EF . The improvement was not dramatic but sufficient to lead us to recommend use of 'EF'-modified guide RNAs for genome editing in Arabidopsis.

The 3' regulatory sequences of Cas9 and the sgRNA influence the overall activity
To avoid post-transcriptional modifications such as capping and polyadenylation, sgRNA must be transcribed by RNA polymerase III (Pol III). Several approaches involving ribozymes, Csy4 ribonuclease or tRNA-processing systems have been proposed but were not tested here [34][35][36]. U6-26 is a Pol III-transcribed gene in Arabidopsis [37]. We used 205 bp of the 5' upstream region of U6-26 as promoter and we compared a synthetic polyT sequence (seven thymidines) and 192 bp of the 3' downstream region as terminator. A T-rich stretch has been reported to function as a termination signal for Pol III [38].
In seven out of nine side-by-side comparisons, the authentic 192 bp of U6-26 terminator directed a higher efficiency of the construct, as compared to a synthetic polyT termination sequence (Fig 6 and S2 Fig). We speculate that a stronger terminator increases the stability of the sgRNA. For multiplex genome editing, the use of 192 bp per sgRNA will result in longer T-DNAs and increase the risk of recombination and instability. We generated constructs with only 67 bp of the U6-26 3' downstream sequence. Such constructs were not compared side-byside with the '192 bp terminator', although they enabled modest to high mutation rates (e.g. Fig 3F and 3G). With these results in mind, we recommend using 67 bp of the 3' downstream sequence of U6-26 as terminator for the sgRNA.
Since 3' regulatory sequences can influence sgRNA stability, we tested if the same was true for Cas9. We compared the Pisum sativum rbcS E9 with two A. tumefaciens terminators commonly used in Arabidopsis: Ocs and Ags (Fig 7). We did not observe consistent differences between E9 and Ocs (Fig 7A and 7B). However, in one comparison, E9 outperformed Ags ( Fig  7C). This is consistent with previous observations that RNA Polymerase II (Pol II) terminators quantitatively control gene expression and influence CRISPR efficiency in Arabidopsis [17,39]. We propose that a weak terminator after Cas9 enables Pol II readthrough that could interfere with Pol III transcription of sgRNAs in some T-DNA construct architectures. This limiting factor can be corrected by divergent transcription of Cas9 and sgRNAs.

Divergent transcription of Cas9 and sgRNA expression can elevate mutation rates
The Golden Gate Level 1 acceptor vector collection contains seven 'forward' expression cassettes and seven 'reverse' expression cassettes, which are interchangeable [23]. We assembled 'RPS5a:Cas9_4:E9' and 'YAO:Cas9_3:E9' in both the Level 1 vector position 2 forward (pICH47742) and reverse (pICH47811) (Figs 1 and 6). In one case, CRISPR activity was moderate when Cas9 and sgRNA are expressed in the same direction and high when they are expressed in opposite direction (Fig 8A). In another case, CRISPR activity was very high in both cases (Fig 8B).
We thus recommend to both use a strong terminator after Cas9 (e.g. E9 or Ocs) and express Cas9 and sgRNA in opposite directions.

Most of the stable double events are homozygous rather than biallelic
From the mutant screen, 315 allyl-alcohol resistance lines were confirmed by capillary sequencing (S5 Table). We classified them in four categories: (i) 59% were homozygous (single sequencing signal, different than ADH1 WT), (ii) 11% were heterozygous (dual sequencing signal, one matching ADH1 WT), (iii) 10% were biallelic (dual sequencing signal, none matching ADH1 WT) and (iv) 20% were difficult to assign (unclear sequencing signals, either biallelic or due to somatic mutations, but clearly different than WT, heterozygous or homozygous genotypes) (Fig 9). The recovery of heterozygous (ADH1/adh1) lines indicates that the loss of a single copy of ADH1 can sometimes enable plants to survive the allyl-alcohol selection.

Discussion
CRISPR emerged in 2012 as a useful tool for targeted mutagenesis in many organisms including plants [11,32]. In Arabidopsis, the transgenic expression of CRISPR components can be straightforward, avoiding tedious tissue culture steps. Many strategies to enhance the overall CRISPR-induced mutation rate have been proposed [8,13,[15][16][17]40]. Here we report a systematic comparison of putative limiting factors including promoters, terminators, codon optimization, sgRNA improvement and T-DNA architecture. We found that the best promoters to control Cas9 expression are UBI10, YAO and RPS5a. The best terminators in our hands were Ocs from A. tumefaciens and rbcS E9 from P. sativum. A plant codon-optimized, intron-containing Cas9 allele outperformed the other alleles tested. A modified sgRNA with a hairpin Extension and a nucleotide Flip, called sgRNA EF , triggers slightly elevated mutations rates. The sgRNA transcription regulation by the authentic 3' regulatory sequence of AtU6-26 results in better CRISPR activity. We get high mutation rates with either 67 bp or 192 bp of terminator and recommend using the shortest (67 bp). We hypothesise that a weak terminator after Cas9 enables RNA-polymerase II readthrough within the sgRNA expression cassette, preventing optimal expression of the sgRNA. Indeed, we noted an elevated CRISPR-Cas9 efficiency by expressing Cas9 and sgRNA in opposite directions.
Considering the combinations of Cas9 and sgRNA genes tested in this study, we recommend to use a 'YAO:Cas9_3:E9' and a 'pU6-26:sgRNA EF :U6-26T 67 ' cassettes in head-to-head orientation. This combination is included in the constructs tested here (Fig 8B) and enabled us to recover one homozygous mutants in five T1 plants tested. We also obtained useful rates with other constructs (e.g. Fig 3F), indicating that the CRISPR components do not entirely explain the final CRISPR activity. It was recently reported that heat stress increases the efficiency of CRISPR in Arabidopsis [41]. Environmental conditions may explain fluctuation of the CRISPR activity, independently of the T-DNA architecture. Five lines were tested for H2T instead of six. The sgRNA targets ADH1. CRISPR activity measured in % of homozygous or biallelic stable mutants in the second generation after transformation (T2). Each dot represents an independent T2 family. Red dot: All the T2 lines from this family carry the same mutation, indicating a mutation more likely inherited from the T1 parent rather than being de novo from the T2 line. Bold and underlined: Most active construct(s) for each panel.
We were surprised to recover more homozygous than biallelic events. Stable double mutations are the result of two CRISPR events, on the male and female inherited chromosome respectively. In this scenario, lines can be recovered with two different mutations, resulting in a biallelic (e.g. adh1-2/adh1-3) genotype, rather than having the same mutation on both chromosomes (e.g. adh1-1/adh1-1). Double-strand break-induced homologous recombination occurs between allelic sequences [42]. It has been reported that double strand breaks caused by CRISPR-Cas9 can increase this phenomenon [43]. Allelic recombination can explain our observation of the same mutation on both copies of ADH1. The prevalence of homozygous over biallelic genotypes facilitates the genotyping and is an advantage for targeted mutagenesis using CRISPR-Cas9. Genotype at ADH1 locus confirmed by capillary sequencing. For each T2 family tested, up to six allyl-alcohol resistant plants were genotyped by capillary sequencing of an sgRNA target (ADH1) PCR amplicon. We retrieved a total of 315 sequences with a mutation. 59% (187) showed a single sequencing signal, different than ADH1 WT and were classified as "Homozygous". 11% (33) showed an overlap of two sequencing signals, one matching ADH1 WT and one different; and were classified as "Heterozygous". 10% (31) showed an overlap of two sequencing signals, none matching ADH1 WT; and were classified as "Biallelic". 64 (20%) showed an overlap of signals different than WT but not clear enough to distinguish; and were classified as "Unknown". The "Unknown" sequences can be biallelic or due to somatic mutations but are different than WT, heterozygous or homozygous genotypes. https://doi.org/10.1371/journal.pone.0204778.g009

Optimization of CRISPR for Arabidopsis
We used a glufosinate resistance selectable marker which enables easy selection of transgenic lines. It can be important to segregate away the T-DNA in the CRISPR mutant line for multiple reasons. For instance, a loss-of-function phenotype must be confirmed by complementation of the CRISPR-induced mutation. A CRISPR construct still present in the mutant can target the complementation transgene and interfere with the resulting phenotypes. Selection of non-transgenic lines is possible but complicated with classic selectable markers such as kanamycin or glufosinate resistance, since a selective treatment kills the non-transgenic plants.
FAST-Green and FAST-Red provide a rapid non-destructive selectable marker and involve expression of a GFP-or RFP-tagged protein in the seed [44]. Transgenic and non-transgenic seeds can be distinguished under fluorescence microscopy [16,45,46]. This facilitates recovery of mutant seeds lacking the T-DNA (Fig 10). Homozygous mutants can be identified among the independent T1 lines. Non-fluorescent seeds can be selected from the T1 seeds. The resulting T2 plants are homozygous mutant and non-transgenic.
We report a CRISPR-and Golden Gate-based method to generate stable Arabidopsis mutant lines in one generation. In our efforts to elevate mutation rates in Arabidopsis, we found several limiting factors mostly related to Cas9 and sgRNA transcription. Some of these At the sgRNA target site, they can be WT, or display somatic, heterozygous, biallelic of homozygous mutations. All the possibilities are represented here. "Somatic" describes events happening in somatic cells, not inherited in the next generation. As somatic events can happen independently in each cell, they often result in mosaic pattern of mutations across the leaf. One line has homozygous mutation (mut1/mut1). It produces seeds segregating for the T-DNA, visible under microscope if using FAST-Red. The seeds will segregate 3:1 (Red: Non-red) if there is one locus insertion, 15:1 (Red: Non-red) if there are two loci insertion, etc. The T2 progeny of (mut1/mut1) is 100% homozygous for the mutation. The non-red seeds are also T-DNA free.
https://doi.org/10.1371/journal.pone.0204778.g010 findings can be tested for other plant species and for knock-in breeding. The generation of null alleles via CRISPR is today quick and simple, facilitating the investigation of gene function. Improvement of rates of gene 'knock-ins' provides the next challenge. In vivo gene tagging or knock-in breeding are theoretically possible and have been reported [47][48][49][50]. Improvements in CRISPR-based genome editing techniques will facilitate the study of genes and proteins and be beneficial for both basic and applied plant science.
Combinations of three Level 0 vectors containing respectively a promoter, a Cas9 coding sequence and a terminator were assembled in Level 1 vector pICH7742 (Position 2) or pICH47811 (Position 2, reverse) by the same 'Golden Gate' protocol but using 0.5 μl of BpiI enzyme (10U/μl, ThermoFisher) instead of 0.5 μl of BsaI-HF.
To generate the sgRNA expression cassettes, DNA fragments containing the classic or the 'EF' backbone with 7, 67 or 192 bp of the U6-26 terminator were amplified using primers flanked with BsaI restriction sites associated with Golden Gate compatible overhangs (S3 Table). The amplicons were assembled with the U6-26 promoter (pICSL90002) in Level 1 vector pICH7751 (Position 3) by the 'Golden Gate' protocol using the BsaI-HF enzyme. Combinations of three Level 1 vectors containing a glufosinate resistance selectable maker (pICSL11017), a Cas9 expression cassette and a sgRNA expression cassette were assembled in Level 2 pAGM4723 (overdrive) or pICSL4723 (+ overdrive) by the 'Golden Gate' protocol using the BpiI enzyme. All the plasmids were prepared using a QIAPREP SPIN MINIPREP KIT on Escherichia coli DH10B electrocompetent cells selected with appropriate antibiotics and X-gal.

Plant transformation, growth and selection
Agrobacterium tumefaciens strain GV3101 was transformed with plasmids by electroporation and used for stable transformation of Arabidopsis accession Col-0. Arabidopsis plants were grown in 'short days' conditions (10 hr light/14 hr dark, 21˚C). Transformants were selected by spraying three times 1-to 3-weeks old seedlings with phosphinotrycin at a concentration of 0.375g/l. 4-weeks old resistant plants were transferred in 'long days' conditions (16 hr light/8 hr dark, 21˚C) for flowering. For each genotype, six independent T1 were self-pollinated to obtain six independent T2 families per construct.

Characterisation of CRISPR events
T2 families were tested for resistance to allyl-alcohol.~100 seeds were sterilized, immersed in water (4˚C, dark, overnight), treated with allyl-alcohol (30mM, room temperature, 2 hours, shaken at 750rpm), rinsed three times with water and sown on MS 1/2 medium. After two weeks, the number of germinated and non-germinated seeds was monitored. DNA was extracted from up six allyl-alcohol resistant plants (or all the resistant plants if there were less than six) for genotyping.~0.5cm 2 of leaf tissue was printed by mechanical pression onto an FTA filter paper (Whatman Bioscience). 1-mm disks were punched out from FTA filter paper by using a punch and placed in a 200μl PCR tubes. One disc was used per tube. Samples were incubated in 50μl of FTA buffer (1.25ml Tris 1M, 500μl EDTA 0.5M, 12.5μl Tween 20 and water up to a total volume of 125ml) for 2 hours and rinsed with water. PCR was performed on this template using primers flanking the sgRNA target in ADH1 (S3 Table) and Q5 High-Fidelity DNA Polymerase (NEB, following the manufacturer recommendations). After amplification, the PCR products were resolved by electrophoresis on a 1.5% agarose gel and purified using the QIAquick Gel Extraction Kit (QIAGEN). The purified PCR product was sequenced using the same primer set for amplifications by capillary sequencing (GATC Biotech). Sequencing results were compared to the Col-0 sequence of ADH1 using CLC Main Workbench 7.7.1. ADH1 genotypes were reported as WT (identical to Col-0), heterozygous (both Col-0 and single mutation detected), biallelic (two different mutations detected), homozygous (single mutation detected) or somatic (more than two signals detected). The number of confirmed mutants among all the allyl-alcohol resistant lines was used to estimate the total number of real mutants among allyl-alcohol survivors from each plate. For each T2 family, the CRISPR efficiency was defined as the ratio of homozygous and biallelic mutants compared to the total number of seeds sown. Plots presented in this article were made using ggplot2 in R version 3.3.2.
Supporting information S1  Table). Vector "pAGM4723" lacks an overdrive; Vector "pICH4723" has an overdrive. "Same_mutation" indicates whether all the lines carry the same mutation. It is applied only if more than 75% of the seeds germinated. If so, it indicates that the parent was likely a homozygous mutant and the mutation was inherited to all progenies. "extension-flip" sgRNA. U6-26T: 7 bp of the At3g13855 terminator. RB: Right Border. The sgRNA targets ADH1. CRISPR activity measured in % of homozygous or biallelic stable mutants in the second generation after transformation (T2). Each dot represents an independent T2 family. Bold and underlined: Most active construct(s) for each panel. The overdrive sequence can increase the integration efficiency [21]. In one comparison the presence of the overdrive results in slightly better activity (C), but in another one it did not (B). We concluded that the presence of an overdrive does not influence the CRISPR efficiency. Thus, we could compare constructs independently of the presence of an overdrive.