The recently developed CRISPR screen technology, based on the CRISPR/Cas9 genome editing system, enables genome-wide interrogation of gene functions in an efficient and cost-effective manner. Although many computational algorithms and web servers have been developed to design single-guide RNAs (sgRNAs) with high specificity and efficiency, algorithms specifically designed for conducting CRISPR screens are still lacking. Here we present CRISPR-FOCUS, a web-based platform to search and prioritize sgRNAs for CRISPR screen experiments. With official gene symbols or RefSeq IDs as the only mandatory input, CRISPR-FOCUS filters and prioritizes sgRNAs based on multiple criteria, including efficiency, specificity, sequence conservation, isoform structure, as well as genomic variations including Single Nucleotide Polymorphisms and cancer somatic mutations. CRISPR-FOCUS also provides pre-defined positive and negative control sgRNAs, as well as other necessary sequences in the construct (e.g., U6 promoters to drive sgRNA transcription and RNA scaffolds of the CRISPR/Cas9). These features allow users to synthesize oligonucleotides directly based on the output of CRISPR-FOCUS. Overall, CRISPR-FOCUS provides a rational and high-throughput approach for sgRNA library design that enables users to efficiently conduct a focused screen experiment targeting up to thousands of genes.
(CRISPR-FOCUS is freely available at http://cistrome.org/crispr-focus/)
Citation: Cao Q, Ma J, Chen C-H, Xu H, Chen Z, Li W, et al. (2017) CRISPR-FOCUS: A web server for designing focused CRISPR screening experiments. PLoS ONE 12(9): e0184281. https://doi.org/10.1371/journal.pone.0184281
Editor: Jianwei Zhang, University of Arizona, UNITED STATES
Received: April 28, 2017; Accepted: August 21, 2017; Published: September 5, 2017
Copyright: © 2017 Cao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This study was supported by National Natural Science Foundation of China, No. 31401104 (QC), National Natural Science Foundation of China Grant 31329003 and NIH R01 HG008927 of US (to XSL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)–CRISPR-associated system genes 9 (Cas9) system has been proving itself to be a prominent genome-editing technique [1–2]. Based on the CRISPR/Cas9 system, CRISPR screening is a high-throughput technology that enables researchers to examine the effect of perturbing tens of thousands of genes in parallel [3–5]. In a CRISPR-based screening experiment, single-guide RNA (sgRNA) pools designated to target different genomic loci are delivered into the cells by the lentivirus system, while the function of a gene can be inferred by comparing the abundance of cell populations bearing sgRNAs that target this particular gene across different conditions. CRISPR screening has been applied to interrogate gene functions in different contexts, including immune response [6–7], cancer progression [8–10] and metastasis , while recently this technique was being used to identify the functions of non-coding elements as well [12–18].
Many CRISPR screening experiments are conducted as unbiased, genome-scale approaches, where several genome-wide screening libraries are available [3,8–9,19]. On the other hand, focused screen is also conducted in many studies, where researchers use a small-scale library to target gene sets of specific interests (e.g., oncogenes/tumor suppressors for oncologists or cytokines for immunologists) , to validate hits of genome-wide screens , or to reduce the cost of screens (e.g., in in vivo settings ).
To design libraries for CRISPR screens (especially focused screens), several computational tools can be applied [19,21–30]. However, most of these algorithms provide optimized sgRNAs for only one or several genes/sequences [22–23,29]. A few web-based tools with nominal batch design capacity require users to provide target sequence for each individual gene, have strict size limits on the sequence file uploaded, could only accept limited numbers (10–20 mostly) of gene IDs as input, or base their work on mining of public domain libraries [19,25–26,30]. Some other tools with substantial batch-design capacity are not web-based, and require users to download the whole database, compile the source code and fine tune up to dozens of parameters [21,24,27–28]. Therefore, a user-friendly automatic tool is needed to facilitate the design process of CRISPR screen experiments.
Another issue of library design comes from the rational sgRNA evaluation and selection based on multiple criteria. Preferably, sgRNA should have fewer off-target effects (based on the alignment of spacer sequence across the whole genome [23,26–28]), and higher on-target knockout efficiencies (determined mainly by the sgRNA sequence context [19,31]), while it is proved necessary to consider both of them [9,32]. Other factors, like sequence conservation  and isoform structures of target genes [25,32], also have a marked impact on the results of the screen experiments. Once multiple scores are calculated for all candidate sgRNAs, a method will become necessary for sgRNAs prioritizing and filtering. Common practices include weight-averaging all scores by assigning a fixed (or empirical) weight for each criterion [19,24]; or applying the filters one by one, followed by ranking the candidates lexicographically . These approaches might be too loose or too rigid in sgRNA selection, because the distribution of these scores might vary among different genes. To reach optimal sgRNA ranking results, an ideal method should consider all criteria, and summarize them appropriately in a context dependent way.
In light of requirements from CRISPR screen experiments, we developed CRISPR-FOCUS, a web-based method for library design of CRISPR screens. With minimum user input, CRISPR-FOCUS selects different numbers of sgRNAs targeting up to one thousand genes in human or mouse genome. SgRNAs in the output are ranked by their summary score, which is a comprehensive evaluation of efficiency, specificity, as well as target sequence conservation and the target of multiple isoforms. To our knowledge, CRISPR-FOCUS is the only web-based tool that is specially optimized for CRISPR screening experiments.
Methods and implementation
The scheme of CRISPR-FOCUS is presented in Fig 1. All possible sgRNA candidates that have canonical Protospacer Adjacent Motif (PAM) in human and mouse genome are discovered and stored in the backend database. For each of the candidate sequence, all their attributes (described in details below) are pre-computed and stored. When user performs a query through the web interface, CRISPR-FOCUS will retrieve all possible candidates, prioritize them and return the top ones with highest scores.
Criteria for sgRNA performance evaluation
To reach the best CRISPR-based knockout effect, the selection of sgRNAs should be optimized to (1) maximize their on-target cleavage effects (i.e., maximize efficiency), (2) minimize potential off-target effects (i.e., maximize specificity), (3) ensure the fidelity of their sequence with corresponding target loci (and to avoid regions with possible genomic variations), and (4) consider the importance of target region (evaluated by sequence conservation and isoform structure). CRISPR-FOCUS evaluates every sgRNA with the following indices.
The cleavage efficiency of a sgRNA is a major factor that determines the sensitivity of a screen experiment . We used SSC , a computational algorithm that we previously developed to predict the cleavage efficiency of candidate sgRNAs. SSC takes spacer sequences as well as its flanking sequences as input, and uses Least Absolute Shrinkage and Selection Operator (LASSO) model to calculate an efficiency score for each sgRNA. CRISPR-FOCUS will filter sgRNAs with efficiency score below zero.
For each candidate sgRNA, CRISPR-FOCUS first calculated its specificity score  to evaluate the overall similarity with putative off-target genomic loci. For sgRNAs that have perfect-match off-targets, we further divided them into three categories according to their off-target positions: (1) non-exon hits that do not overlap with exons of any coding or non-coding genes, (2) exon (but non-coding) hits that overlap with exons of non-coding genes, and (3) coding region hits that overlap with exons of coding genes. These sgRNAs may be considered in a rescue step (described later).
The effect of possible variations.
SgRNAs are usually designed based on the reference genome sequence. The knockout efficiencies of these sgRNAs may be affected by the genomic sequences in cells that are different from the reference, especially mutation. CRISPR-FOCUS prefers sgRNAs that cover no or fewer mutations, including Single Nucleotide Polymorphisms (SNPs) and somatic mutations (especially in cancer). CRISPR-FOCUS retrieved SNP information from dbSNP , and annotated each sgRNA with all possible SNPs whose minor allele frequency (MAF) is higher than 0.05. sgRNAs that cover no or fewer variations will be preferentially chosen in the selection procedure. If screen experiments are conducted in cancer cells, users could also choose whether to avoid recurrent somatic mutations from different cancer types (using the COSMIC database ).
Regions in a gene with higher conservation rates across species are more likely to be important, as they usually encode conserved functional domains (like catalytic center for enzyme or DNA binding domain for transcriptional factor) whose knockout are more likely to disrupt gene function . CRISPR-FOCUS annotated each sgRNA with an average phastCon conservation score  of the corresponding target position.
Some genes have multiple isoforms (or transcripts) with different structures. To completely knockout a gene, a sgRNA should ideally target as many isoforms as possible. For each exon region, CRISPR-FOCUS calculates an “isoform commonality score”, which is defined as the percentage of isoforms that uses this exon. SgRNAs targeting exon regions with higher scores are preferred.
SgRNA selection and ranking
For each gene in the query, CRISPR-FOCUS first retrieves all genomic coordinates of all exons, and collects all sgRNA candidates that overlap with these regions. It will next perform a “filter and rescue” procedure (described in S1 File in detail) to rank all candidates and pick up the top ones. For the filtering step, CRISPR-FOCUS will filter sgRNAs that are empirically regarded as “bad” candidates, including sgRNAs that: (1) overlap with a SNP or mutation loci, (2) contains >40% guanine counts (‘G’s), which is observed to have higher off-target effects , or (3) are perfectly matched to putative off-target loci within the genome. The remaining ones will be ranked by a summary score, which is a weighted summary of efficiency, specificity, phastCon conservation and exon commonality score, while all the weights are dynamically defined by the Criteria Importance Through Intercriteria Correlation (CRITIC) method . The purpose of this method is to determine the objective weight for each criterion in multiple criteria decision problems. Briefly in CRITIC, a value Cj is calculated to quantify the amount of information transmitted by criterion j, which is determined by both contrast intensity and conflict of the decision criteria. The contrast intensity is represented by the standard deviation of j, while the conflict is measured as the multiplicative aggregation of one minus correlation coefficients between j and the rest of criteria. Finally, object weight wj is generated by normalizing Cj to the unity of all C values.
If the number of remaining sgRNAs does not reach the desired number, CRISPR-FOCUS will execute a “rescue” step to retrieve more possible sgRNAs. At this stage, sgRNAs with potential off-target hits will be rescued in the following order: (1) sgRNAs with non-exon off-target hits only, (2) sgRNAs with off-target hits located on non-coding elements but not coding regions, (3) sgRNAs with off-target hits located on coding regions. sgRNAs within the same category will be prioritized based on their number of off-target hits, or by the summary score if two candidates have the same number of hits within the same category. A detailed flowchart of the whole procedure is depicted in Fig 2.
The web portal
The web portal of CRISPR-FOCUS (Fig 3) accepts a gene ID (either official gene symbol or RefSeq ID) list as input, and returns the designated number of sgRNA candidates per each gene. Users can input up to 1000 genes, and retrieve up to 30 sgRNAs per gene. Users can also select sgRNAs from either Homo sapiens or Mus musculus. The web portal applies Common Gateway Interface (CGI) to fetch input, while all backend scripts were written in Python programming language.
A screenshot of the CRISPR-FOCUS website (http://cistrome.org/crispr-focus/)) is shown.
CRISPR-FOCUS also provides other options to accommodate different requirements, including the selection of different sgRNA lengths (19 or 20nt) [5,39]. As commonly used constituents of CRISPR/Cas9 delivery system, human U6 promoter and spCas9 scaffold could be appended to the output, allowing users to synthesize the library directly from the output. Furthermore, CRISPR-FOCUS includes a set of negative control sgRNAs (targeting several known “safe-harbor” loci within human or mouse genome) [40–41] and positive control sgRNAs (targeting 58 essential ribosome genes identified in ). The input and output formats are described in Table A in S2 File. The execution of CRISPR-FOCUS is based on genome assembly hg38 (for human) and mm10 (for mouse), while full versions of public domain databases applied to annotate sgRNAs could be found in Table B in S2 File.
Results and discussion
CRISPR-FOCUS provides a high throughput platform for rational sgRNA library design of CRISPR screen experiment. It could accomplish a full scale design (up to 1000 target genes with 30 sgRNAs for each) within about twenty seconds. To our knowledge, CRISPR-FOCUS is now the only web-based sgRNA design tool that provides batch processing mode for custom CRISPR library design, as well as the most comprehensive tool in sgRNA performance evaluation. By shortening the distance from “silico to bench”, CRISPR-FOCUS facilitates the design of screening experiments and promotes high-throughput functional studies in various scopes.
S1 File. The schema for sgRNA ranking and selection.
The authors thank Hanfei Sun, Chenfei Wang, Binbin Wang and Jinzeng Wang for their help on web server deployment and maintenance, and Wenyan Cui for help on plotting and decorating some of the figures.
- 1. Carroll D. Genome engineering with targetable nucleases. Annual review of biochemistry. 2014;83:409–39. pmid:24606144.
- 2. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014;157(6):1262–78. pmid:24906146; PubMed Central PMCID: PMC4343198.
- 3. Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343(6166):84–7. pmid:24336571; PubMed Central PMCID: PMC4089965.
- 4. Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343(6166):80–4. pmid:24336569; PubMed Central PMCID: PMC3972032.
- 5. Koike-Yusa H, Li Y, Tan EP, Velasco-Herrera Mdel C, Yusa K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nature biotechnology. 2014;32(3):267–73. pmid:24535568.
- 6. Zhou Y, Zhu S, Cai C, Yuan P, Li C, Huang Y, et al. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature. 2014;509(7501):487–91. pmid:24717434.
- 7. Parnas O, Jovanovic M, Eisenhaure TM, Herbst RH, Dixit A, Ye CJ, et al. A Genome-wide CRISPR Screen in Primary Immune Cells to Dissect Regulatory Networks. Cell. 2015;162(3):675–86. pmid:26189680; PubMed Central PMCID: PMC4522370.
- 8. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350(6264):1096–101. pmid:26472758; PubMed Central PMCID: PMC4662922.
- 9. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell. 2015;163(6):1515–26. pmid:26627737.
- 10. Tzelepis K, Koike-Yusa H, De Braekeleer E, Li Y, Metzakopian E, Dovey OM, et al. A CRISPR Dropout Screen Identifies Genetic Vulnerabilities and Therapeutic Targets in Acute Myeloid Leukemia. Cell reports. 2016;17(4):1193–205. pmid:27760321; PubMed Central PMCID: PMC5081405.
- 11. Chen S, Sanjana NE, Zheng K, Shalem O, Lee K, Shi X, et al. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell. 2015;160(6):1246–60. pmid:25748654; PubMed Central PMCID: PMC4380877.
- 12. Fulco CP, Munschauer M, Anyoha R, Munson G, Grossman SR, Perez EM, et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science. 2016;354(6313):769–73. pmid:27708057.
- 13. Sanjana NE, Wright J, Zheng K, Shalem O, Fontanillas P, Joung J, et al. High-resolution interrogation of functional elements in the noncoding genome. Science. 2016;353(6307):1545–9. pmid:27708104; PubMed Central PMCID: PMC5144102.
- 14. Zhu S, Li W, Liu J, Chen CH, Liao Q, Xu P, et al. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library. Nature biotechnology. 2016;34(12):1279–86. pmid:27798563.
- 15. Diao Y, Li B, Meng Z, Jung I, Lee AY, Dixon J, et al. A new class of temporarily phenotypic enhancers identified by CRISPR/Cas9-mediated genetic screening. Genome research. 2016;26(3):397–405. pmid:26813977; PubMed Central PMCID: PMC4772021.
- 16. Rajagopal N, Srinivasan S, Kooshesh K, Guo Y, Edwards MD, Banerjee B, et al. High-throughput mapping of regulatory DNA. Nature biotechnology. 2016;34(2):167–74. pmid:26807528; PubMed Central PMCID: PMC5108523.
- 17. Korkmaz G, Lopes R, Ugalde AP, Nevedomskaya E, Han R, Myacheva K, et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nature biotechnology. 2016;34(2):192–8. pmid:26751173.
- 18. Canver MC, Smith EC, Sher F, Pinello L, Sanjana NE, Shalem O, et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature. 2015;527(7577):192–7. pmid:26375006; PubMed Central PMCID: PMC4644101.
- 19. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature biotechnology. 2014;32(12):1262–7. pmid:25184501; PubMed Central PMCID: PMC4262738.
- 20. Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, Vakoc CR. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nature biotechnology. 2015;33(6):661–7. pmid:25961408; PubMed Central PMCID: PMC4529991.
- 21. Heigwer F, Zhan T, Breinig M, Winter J, Brugemann D, Leible S, et al. CRISPR library designer (CLD): software for multispecies design of single guide RNA libraries. Genome biology. 2016;17(1):55. pmid:27013184; PubMed Central PMCID: PMC4807595.
- 22. Heigwer F, Kerr G, Boutros M. E-CRISP: fast CRISPR target site identification. Nature methods. 2014;11(2):122–3. pmid:24481216.
- 23. Naito Y, Hino K, Bono H, Ui-Tei K. CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites. Bioinformatics. 2015;31(7):1120–3. pmid:25414360; PubMed Central PMCID: PMC4382898.
- 24. Liu H, Wei Z, Dominguez A, Li Y, Wang X, Qi LS. CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics. 2015;31(22):3676–8. pmid:26209430; PubMed Central PMCID: PMC4757951.
- 25. Prykhozhij SV, Rajan V, Gaston D, Berman JN. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences. PloS one. 2015;10(3):e0119372. pmid:25742428; PubMed Central PMCID: PMC4351176.
- 26. Zhang F. Optimized CRISPR Design 2015. Available from: http://crispr.mit.edu/.
- 27. Xie S, Shen B, Zhang C, Huang X, Zhang Y. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PloS one. 2014;9(6):e100448. pmid:24956386; PubMed Central PMCID: PMC4067335.
- 28. Zhu LJ, Holmes BR, Aronin N, Brodsky MH. CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. PloS one. 2014;9(9):e108424. pmid:25247697; PubMed Central PMCID: PMC4172692.
- 29. Ma J, Koster J, Qin Q, Hu S, Li W, Chen C, et al. CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics. 2016;32(21):3336–8. pmid:27402906.
- 30. Panda SK, Boddul SV, Jimenez-Andrade GY, Jiang L, Kasza Z, Fernandez-Ricaud L, et al. Green listed-a CRISPR screen tool. Bioinformatics. 2017;33(7):1099–100. pmid:28414855.
- 31. Xu H, Xiao T, Chen CH, Li W, Meyer CA, Wu Q, et al. Sequence determinants of improved CRISPR sgRNA design. Genome research. 2015;25(8):1147–57. pmid:26063738; PubMed Central PMCID: PMC4509999.
- 32. Chu VT, Graf R, Wirtz T, Weber T, Favret J, Li X, et al. Efficient CRISPR-mediated mutagenesis in primary immune cells using CrispRGold and a C57BL/6 Cas9 transgenic mouse line. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(44):12514–9. pmid:27729526; PubMed Central PMCID: PMC5098665.
- 33. Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology. 2013;31(9):827–32. pmid:23873081; PubMed Central PMCID: PMC3969858.
- 34. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001;29(1):308–11. pmid:11125122; PubMed Central PMCID: PMC29783.
- 35. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic acids research. 2015;43(Database issue):D805–11. pmid:25355519; PubMed Central PMCID: PMC4383913.
- 36. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research. 2005;15(8):1034–50. pmid:16024819; PubMed Central PMCID: PMC1182216.
- 37. Chen C, Li W, Xiao T, Xu H, Jiang P, Meyer CA, et al. Integrative analysis and refined design of CRISPR knockout screens. Genome research. 2017.
- 38. Diakoulaki D, Mavrotas G, Papayannakis L. Determining objective weights in multiple criteria problems: The critic method. Computers & Operations Research. 1995;22(7):763–70. http://dx.doi.org/10.1016/0305-0548(94)00059-H.
- 39. Fu Y, Sander JD, Reyon D, Cascio VM, Joung JK. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nature biotechnology. 2014;32(3):279–84. pmid:24463574; PubMed Central PMCID: PMC3988262.
- 40. Sadelain M, Papapetrou EP, Bushman FD. Safe harbours for the integration of new DNA in the human genome. Nature reviews Cancer. 2011;12(1):51–8. pmid:22129804.
- 41. Carey BW, Markoulaki S, Beard C, Hanna J, Jaenisch R. Single-gene transgenic mouse strains for reprogramming adult somatic cells. Nature methods. 2010;7(1):56–9. pmid:20010831; PubMed Central PMCID: PMC3048025.