Table 1.
Comparison of different sgRNA design software programs.
Fig 1.
Workflows of guide RNA target search in CRISPR MultiTargeter.
A. Simple CRISPR guide RNA search. A user enters a number of sequences or sequence identifiers and specifications for a target search. The program then runs these data, performs a regular expression match, stores the resulting coordinates and generates visual and table views of targets in each sequence. B. Common guide RNA target search in multiple sequences. Input sequences are used to generate a multiple sequence alignment. As in (A), a regular expression with target specifications is run on the alignment consensus in both forward and reverse orientations. A successful match is defined as one having a maximum of one mismatch in the consensus sequence if the user allows mismatches. These matches are then highlighted in the multiple sequence alignment. In addition another algorithm is run on the input sequences to find unique target sites in each sequence (not shown). C. Common and unique guide RNA target search in similar genes or transcripts. In this workflow, gene or transcript sequences are retrieved from the database. Common targets are detected based on the multiple sequence alignment and unique target sites are found using an exhaustive string comparison algorithm (not shown). All targets sites are also checked to lie within a single exon to ensure successful targeting of the genomic sequence. In the illustration, locations of different target sites in genes A and B are shown together with the expected output of the program run. D. Common and unique guide RNA target search in transcripts of a single gene. Search for target sites is performed as described in (C). In the illustration, input sequences are transcript isoforms A1, A2 and A3 of the gene A. The different types of target sites are shown as well as the expected program output. In (C) an (D), common targets are indicated in pink and unique targets are in orange.
Fig 2.
Search algorithm for sgRNA target sites in individual and multiple similar sequences.
Input data for this algorithm consist of a sgRNA target site specification and sequence data. The dashed lines to the sequence boxes represent two possible branches of the algorithm: simple CRISPR sgRNA search and a search for common and unique target sites in multiple similar sequences. Target site specification is common to both branches of the algorithm and consists of a target site length, PAM sequence and its location as well as the sequence of the 5’-dinucleotide and the region where a single mismatch is allowed. The simple sgRNA search is achieved by running a regular expression (search pattern) for the target site specification on all input sequences in both orientations. The program can provide output for the sequence and location of identified target sites in visual and table formats. In the second branch of the algorithm, multiple similar sequences are first aligned using the ClustalW2 program. The resulting multiple sequence alignment is read by the program and the consensus sequence is computed. Running the target site specification expression on this consensus sequence results in the identification of candidate common target sites. If exon sequences are available for a particular sequence (indicated by “?” and dashed lines), each candidate target site in both common and unique sets is checked to ensure that this site lies completely within an exon sequence. Final identified common target sites are then displayed in visual and table formats. The search for unique target sites is accomplished by computing all possible target sites in both orientations in all sequences. Each target site is then compared to all identified target sites in these sequences. The speed of comparison depends on a mismatch count variable (MM count), which ensures that the comparison is stopped (“End”) as soon as there are more than 2 mismatches (identities are indicated by “*”). The target sites which pass this comparison test and the subsequent test for location within exon sequences are confirmed unique target sites. These unique target sites can then be output as before.
Fig 3.
Examples of input and output pages of CRISPR MultiTargeter.
A. The input form for the multiple genes or transcripts input consists of the parameters for the sgRNA target site specification and the identifiers input area. B. The output page consists of the overall header indicating the type of design performed followed by the list of input identifiers which the user provided with the links to Ensembl gene pages where available. This example is from the multiple genes/transcripts workflow performed on sox9a and sox9b zebrafish genes. The main part of the output is focused on common sgRNA target sites and is organized in Visual and Table Views. The user can see the details of these views by clicking on the “expand or hide” links. Visual View consists of links to alignment with the target sites highlighted and markers for the start sites of target sites. Table View contains HTML tables with the relevant information on sgRNA target sites such as their ID numbers, sequences, start, end as well as computed sequence features such as GC % and predicted annealing temperature (Tm) of sgRNA:DNA interaction, exon numbers and predicted scores. There is also a “Unique sgRNA targets” part of the page which is organized similarly.
Fig 4.
Unique transcript isoform-specific sgRNA target sites for Type II CRISPR sgRNAs in zebrafish genes.
A. Proportions of genes with identified transcript isoform-specific sgRNA sites, transcripts with isoform-specific sgRNA sites and proportions of these sites in the sense and anti-sense orientation. sgRNA sites are 20 bp long with the NN 5’-dinucleotide and NGG PAM sequence. B. Distribution of total target site numbers for transcript isoforms. The mean number of sgRNA target sites (48.7) is indicated by a dashed line over the histogram. The graph axes are scaled using the square root function. The histogram bars are colored according to the frequency scale as shown.
Fig 5.
Common target sites for Type II CRISPR sgRNAs in zebrafish ohnologs.
A. Proportions of gene pairs with identified common sgRNA target sites, target sites with single mismatches and without mismatches, as well as in the sense and anti-sense strands. sgRNA sites are 20 bp long with the NN 5’-dinucleotide and NGG PAM sequence. B. Distribution of total common target site numbers for different gene pairs. The mean number of target sites (6.48) is indicated by a dashed line over the histogram. The graph axes are scaled using the square root function. The histogram bars are colored according to the frequency scale as shown.