Discovering functional sequences with RELICS, an analysis method for CRISPR regulatory screens

CRISPR regulatory screens are a powerful technology for discovering sequences that control gene expression, however, a lack of analysis methods limits their effectiveness. In addition, method performance is difficult to assess due to an absence of datasets where the identities of true regulatory sequences are known. To address these problems, we developed an analysis method, RELICS, and a simulation framework, CRSsim, for CRISPR regulatory screens. RELICS detects regulatory elements by modeling guide counts across multiple pools representing different conditions or expression levels and CRSsim generates realistic datasets where the ground truth is known. We used CRSsim to generate 4320 datasets representing 144 scenarios with different characteristics. We compared analysis methods on these datasets and found that RELICS has the best performance under most conditions. We applied RELICS to 8 published datasets for 15 genes and identify previously-validated elements as well as multiple putative elements that were missed by other methods.


Introduction
CRISPR regulatory screens identify gene-specific regulatory elements by targeting thousands of sequences for mutation, repression or activation (Fig. 1a). To target sequences for mutation, singleguide RNAs (sgRNAs) are introduced to cells alongside Cas9. Cas9 creates double strand breaks at the targeted sites, and mutations are introduced by the error-prone non-homologous end-joining process 1 . In a CRISPR interference (CRISPRi) experiment, targeted sites are silenced by a deactivated Cas9 (dCas9) enzyme fused to a repressive domain such as the Krüppel-associated box (dCas9:KRAB) 2,3 . Similarly, in a CRISPR activation (CRISPRa) experiment, dCas9 is fused to an activation domain such as VP64 or p300 4 . Finally, in a tiling deletion screen, small overlapping deletions are used to interrogate large stretches of the genome. The deletions are programmed by pairs of sgRNAs that direct Cas9 to introduce two nearby double-strand breaks resulting in the loss of the intervening sequence 5,6 . The cells containing guides are subjected to selection (e.g. treatment with a drug), allowed to proliferate, or sorted into different pools using fluorescence activated cell sorting (FACS). Counts of guides are then obtained from pools of cells that are collected from different timepoints or FACS bins. CRISPR regulatory screens have important advantages over other approaches for regulatory sequence discovery, which typically rely upon genomic signatures such as histone modifications (e.g. H3K4me1 and H3K27ac) 7,8 , open chromatin 9,10 , and enhancer RNAs (eRNAs) [11][12][13] . First, unlike massively parallel reporter assays (MPRAs) [14][15][16] , CRISPR regulatory screens measure the effects of targeted regulatory sequences in their native sequence context. Second, they connect regulatory sequences to the genes that they control. Finally, CRISPR regulatory screens can discover regulatory sequences that lack canonical enhancer marks.

Figure 1 | Experimental workflow of a CRISPR regulatory screen and the RELICS analysis method. (a)
In a CRISPR regulatory screen, guides are designed to target both coding and non-coding regions. Guides are transduced into cells and the cells expressing guides are (i) selected for survival, (ii) selected for proliferation, or (iii) sorted into pools based on gene expression. The integrated guides in each pool are then sequenced and counted. The counts are used to score individual guides and the scores of multiple nearby guides are aggregated to give scores for genomic regions. (b) RELICS estimates model parameters from a set of positive control guides and background guides. (c) An intercept parameter is estimated for the first pool and mean shifts (βs) are estimated for subsequent pools. A per-guide random effect is the deviance of a guide from the pool means (this accommodates variability in guide frequencies in the vector library). (d) The RELICS score is the log likelihood ratio of the regulatory model to the background model, given the observed counts for a guide.
A variety of methods have been used to analyze data generated by CRISPR regulatory screens including log-ratios of normalized guide counts, and methods designed to detect differential gene expression from RNA-seq (edgeR 17 and DESeq2 18 ). Unfortunately, only a small subset of the elements identified using these approaches have been experimentally validated and for most genomic regions the ground truth (i.e. whether a sequence truly regulates a gene or not) is unknown. Thus, there is currently no way to compare the performance of different analysis methods. To address this problem, we have developed CRSsim (CRISPR Regulatory Screen simulator), an extensive simulation framework that generates datasets where the ground truth is known.
In addition, CRISPR screens often generate counts from multiple pools (e.g. input, high-expression, medium-expression, low-expression), yet existing tools are not capable of analyzing all of the pools at once and typically resort to pairwise comparisons between pools. To address this problem, we have developed RELICS (Regulatory Element Location Identification in CRISPR Screens), a new statistical method for the analysis of CRISPR regulatory screens which can jointly analyze guide counts across multiple pools or time points. We simulate datasets under a wide variety of conditions and systematically compare the performance of different analysis approaches for CRISPR regulatory screens. Under most simulation conditions RELICS is the best-performing analysis method, and when we apply RELICS to published datasets we not only identify most previously-validated elements but also several putative regulatory elements that were missed in the initial studies.

RELICS for CRISPR regulatory screen analysis
We have developed RELICS, a flexible statistical framework that can analyze different types of screens including those based on cell dropout, cell proliferation, and cell sorting into expression pools (Fig. 1bd). RELICS uses a Generalized Linear Mixed Model (GLMM) 19 to describe the observed counts of each guide across pools, where each pool is a timepoint or condition. For example, in a dropout screen there will typically be two pools, one before selection and one following selection. In an expression screen, the pools are typically the input pool and expression bins that the cells are sorted into.
To identify guides targeting regulatory sequences, RELICS computes the likelihood of the observed counts for a guide under two models: a 'regulatory' model (H1: the guide targets a regulatory sequence) and a 'background' model (H0: the guide does not target a regulatory sequence). The RELICS score is the log ratio of these two likelihoods, which can be interpreted as a log Bayes Factor. The regulatory and background models are both GLMMs, with parameters that are estimated empirically from subsets of guides in the dataset. The parameters for the regulatory model are estimated from guides that are designated as positive controls; typically, these guides target the exons or promoter of the gene of interest. The parameters for the background model are estimated either (1) from all guides that are not designated as positive controls, or (2) from a set of guides that are designated as negative controls (e.g. non-targeting guides). A major advantage of RELICS is that it jointly models the data from multiple pools, and the behavior of each pool is learned empirically from the dataset. This approach differs substantially from previous analysis methods which can only perform pairwise comparisons of pools (e.g. high expression to low expression).
After computing a score for each guide, RELICS assigns scores to base positions in the genome by summing the log Bayes Factors of the guides that overlap that position ( Supplementary Fig. 1). A base position is considered to be overlapped by a guide if it is within the 'area of effect' for the CRISPR system used. For a single guide Cas9 screen, the area of effect is assumed to be 20bp 20 , for CRISPRi and CRISPRa the area of effect is assumed to be 1kb 3 , and for a paired guide deletion screen, the area of effect is specified by each deletion.

Simulation framework
A major limitation in assessing the performance of analysis methods for CRISPR regulatory screens is that there is no gold standard dataset where all regulatory elements are known. To address this limitation, we developed a simulation framework, CRSsim, to systematically compare the performance of RELICS and other methods under different conditions. The simulations capture the effects of   (Fig. 2a). Experimental variables include the complexity of the vector library, the efficiency of the guides, the spacing of sequences targeted by guides, the sequencing depth of the different pools, and the strength and type of selection that is performed (e.g. drop-out or flow-sorting by expression).
Biological variables include the number of regulatory elements, the genomic size of regulatory elements, and the effect sizes of the regulatory elements (i.e. how they affect gene expression when perturbed). CRSsim incorporates these variables by drawing cells containing guides from an input distribution; sorting cells into different pools with probabilities defined by a Dirichlet-Multinomial distribution; and "sequencing" the guides in the pools by sampling either with or without replacement ( Fig. 2b).
We evaluated the similarity of data simulated by CRSsim to experimental data by comparing the distributions of guide counts, and rank changes in guide counts after selection/sorting (Fig. 2a, Fig. 2-4). In most cases, we found the experimental and simulated guide datasets to be similar.

Performance assessment of RELICS and other analysis methods with simulated data
To evaluate the performance of RELICS and other analysis methods under a wide variety of conditions, we generated over 4000 datasets with CRSsim (Fig. 2c). We performed 3 major types of simulation, each of which was designed to resemble the experimental design of a published screen. For all simulation types, we set the input guide distribution to match the distribution observed in the original experiment. Type 1 is based on the CRISPRi proliferation screen performed by Fulco et al. 21 . Type 2 is based on the gene expression screen by Simeonov et al. 22 and sorts cells into 4 different pools (highexpression, medium expression, low expression, and no expression). Type 3 is based on the expression screen of Diao et al. 5 and sorts cells into high, medium, and low expression pools. We additionally performed a fourth type of simulation, Type 4, that is also based on Diao et al., with the modification that the medium pool is assumed to have no contribution to the overall signal (i.e. positive and negative guides are equally likely to be sorted into the medium pool). The purpose of the latter simulation is to confirm that RELICS still performs well when one of the pools provides little extra information.
Within each of the four simulation types we varied the enhancer strength (low, medium, high), the guide efficiency (low, medium, high), the selection strength (low, high), and the sequencing depth (medium, high). The enhancer strength describes how strongly the enhancers affect the expression of the gene, while the selection strength describes the difference in sorting (or dropout) probabilities between positive and negative controls. We set the medium and high sequencing depths to an average of 15 or 100 reads per guide respectively. In each simulation we generated data for 8,700 guides targeting a 150kb region containing 25 enhancers, where each enhancer spans 50bp. The combinations of the 4 simulation types and the 4 variables described above result in a total of 144 different simulation scenarios. We performed 30 simulations for each scenario to generate a total of 4320 simulation datasets. For this analysis, we simulated CRISPRi screens, however CRSsim can also simulate CRISPRa screens, tiling deletion screens, and Cas9-based screens.
We ran RELICS and four other analysis methods (edgeR 17 , DESeq2 18 , fold change and CRISPRsurf 23 ) on the simulated datasets. Both edgeR and DESeq2 test whether the rate of guide counts differs between two pools (e.g. high expression vs. low expression), taking into account sequencing depth and overdispersion. We used edgeR and DESeq2 to compute a p-value for each guide and combined pvalues for overlapping guides using Fisher's method. For the fold change analysis method, we computed the mean fold change across overlapping guides. CRISPRsurf takes scores as input (by default the scores are fold change) and performs a LASSO-based deconvolution to estimate effect sizes and p-values for each genomic position. We tested CRISPRsurf using both fold change and RELICS guide scores as input.
To assess the performance of the analysis methods, we divided the simulated regions into positive (regulatory) and negative (non-regulatory windows). Since each window spans multiple sites with potentially different scores, we assigned each window the highest score from all of the sites within the window. We then computed precision-recall curves for the windows and assessed the overall performance of each analysis method on each dataset by computing the area under the precision-recall curve (prAUC).
RELICS had the best performance out of all of the methods on most of the simulated datasets and had the highest median prAUC for 86/144 simulation scenarios (Fig. 3, Supplementary Fig. 5). Even in the 58 scenarios where RELICS was not the best method, RELICS' performance was very close to the best with a mean difference in median prAUC of -0.03 ( Supplementary Fig. 6). Fold-change also performed reasonably well across most datasets, whereas DESeq2, edgeR, and CRISPRsurf performed inconsistently. In particular, these latter three methods worked well on simulations with strong enhancers but performed poorly on data from simulations with weak enhancers or weak selection strength. We also tested different methods for combining guide scores including sliding windows and MAGeCK 24 (DESeq2 paired with α-RRA) however these modifications diminished performance ( Supplementary Fig. 7). Note that our performance comparison of analysis methods should not be biased to favor RELICS as the simulation framework was designed to mimic experimental procedures and is in no way tailored to the RELICS analysis model. In summary, RELICS consistently has the best, or close to the best, performance under a wide variety of simulation scenarios.
To determine an appropriate RELICS score cutoff for identifying regulatory elements, we computed the average false discovery rate (FDR) for all 144 simulation scenarios under different RELICS score thresholds ( Supplementary Fig. 8). A score cutoff of 5 results in a FDR below 10% under most simulation scenarios with strong selection strength and we applied this threshold for the analysis of public data sets described below. Simulations with low selection strengths, low guide efficiencies, or low sequencing depths had higher FDRs under this threshold. A more stringent cutoff of 10, generally results in FDRs beneath 10% even for simulations with low guide efficiencies as long as there is strong selection and medium to strong enhancers ( Supplementary Fig. 8c).

Application of RELICS to published datasets
We applied RELICS to 15 genes from 8 published studies 5,6,21,22,[25][26][27][28][29] . For most datasets, RELICS detects all of the regulatory elements that have been experimentally validated and identifies additional putative regulatory elements in several cases (Supplementary Table 1). We detail our findings from a subset of these studies here.
We applied RELICS to data from Simeonov et al. 2017, in which a CRISPRa screen 22 was performed to search for regulatory elements for CD69 (Fig. 4a) and IL2RA (Fig. 4b) in Jurkat T cells. For both genes, cells were sorted into 4 pools based on expression (negative, low, medium, high), and in their published analysis, the authors performed pairwise comparisons of pools using fold change. For CD69, the authors identified 3 regulatory sites, one of which was a previously-reported enhancer for CD69, known as CNS2 30 . We analyzed all pools jointly with RELICS and found that the RELICS results are visibly less noisy than those from the fold-change analysis, with cleanly-delineated predicted regulatory regions. RELICS detected all 3 of the previously identified elements for CD69, as well as an additional region. This region is located in an evolutionarily-conserved region 7kb upstream of the transcription start site (TSS) with a weaker, but highly significant, RELICS score of 15 ( Supplementary Fig. 9). For IL2RA, 6 putative regulatory elements were identified. Only 1 of the 6 exhibited enhancer activity in luciferase reporter assays, but 2 of the elements were confirmed to affect IL2RA expression by transduction of single guides and flow cytometry. RELICS detected both of the confirmed regulatory elements but did not detect any of the unconfirmed elements, which suggests that some of the predictions from the fold-change analysis method may have been false positives. We next applied RELICS to FACS Cas9 screens for human BCL11A and its mouse homolog, Bcl11a 26 . weaker signal in DHS+55 (score of 15). Notably, one of the two significant regions in h+62 is immediately adjacent to a SNP (rs1427407) that is associated with BCL11A expression and βhemoglobin disorders (Fig. 4c) 31 . Using the data for mouse Bcl11a, RELICS discovered the regulatory element in the previously-reported m+62 region. In addition, we also detected two putative regulatory elements in the m+58 region, which were not identified in the original study. (Supplementary Fig. 10).
Finally, we applied RELICS to data from a CRISPRi proliferation screen by Fulco et al. 2016, which targeted non-coding regions around GATA1 and MYC in K562 cells 21 (Supplementary Fig. 11). The authors found 3 regulatory regions around GATA1 and validated two of them. RELICS detected both validated regions but did not detect the non-validated region near GLOD5 (Supplementary Fig. 11a).
This suggests that the non-validated region near GLOD5 may be a false positive detected by foldchange. Fulco et al. also found 7 genomic regions around MYC (e1 -e7) that affect cellular proliferation and that were interpreted as likely enhancers for MYC. RELICS detected these same 7 regions with a stringent score cutoff of 10. At a less-stringent cutoff of 5, RELICS detected an additional 18 regions which may serve as weaker regulatory regions. Notably, RELICS scores are substantially less noisy than those obtained using log fold change (Supplementary Fig. 11b).

Discussion
Using data generated by CRSsim, we performed the first systematic comparison of analysis methods for CRISPR regulatory screens. CRSsim is open-source and we envision that it will be a useful tool for future performance comparisons, power analyses, and making informed decisions about experimental designs for CRISPR regulatory screens (e.g. spacing of sites targeted by sgRNAs, the sequencing depth, and the complexity of the vector library).
We also introduced RELICS, which identifies regulatory elements by comparing the likelihood of the observed guide counts under a regulatory model to the likelihood under a background (null) model. RELICS jointly learns the parameters of each model, using a set of guides that are designated as positive and negative controls. This is both a strength and limitation of RELICS. A strength of this approach is that the behavior of guides across different pools is learned from the data itself. A limitation of this approach is that it is difficult to apply RELICS to datasets that do not have many positive controls and the positive controls may not adequately represent all types of regulatory elements, especially repressive or weak regulatory elements. These limitations could potentially be addressed with an unsupervised learning approach where training labels are not provided and instead categories of similarly-behaving guides are identified by clustering the data. Ideally the categories identified by this approach would represent different types of elements (e.g. strong regulatory elements, weak regulatory elements, silencers, non-regulatory elements).
RELICS does not currently model variation in guide efficiency, variation in the strength of regulatory elements, or off-target effects. These limitations are somewhat mitigated by combining information across multiple guides when computing scores for genomic positions, and our simulations demonstrate that RELICS performs reasonably well, even in the presence of weak enhancers or weak guide efficiency. Nonetheless, future versions of RELICS that explicitly model these sources of variation will likely achieve even better results.
In summary, we developed a flexible simulation framework, CRSsim, that allowed us to systematically evaluate the performance of analysis methods for CRISPR regulatory screens under different scenarios. We also developed RELICS, a powerful and flexible tool for the analysis of CRISPR regulatory screens that incorporates information from an arbitrary number of guide pools. RELICS performs better than previous analysis methods under most simulation conditions and, when applied to real data, identifies both previously-validated regulatory elements and new putative regulatory elements. Thus, RELICS is an extremely useful tool for the discovery of regulatory elements from CRISPR screens.

Simulation framework
CRSsim is open source and available on GitHub (https://github.com/patfiaux/CRSsim). To run it, the user first specifies the screen type, which can either be a selection screen (e.g. drop out or proliferation) or an expression screen, with an arbitrary number of expression pools. (We describe the simulation framework for an expression screen, but the same principles hold for selection screens.) Next, the user specifies (1) the screening method (Cas9, CRISPRi, CRISPRa, dual-guide CRISPR), (2) the number and size of true regulatory elements, (3) the spacing/density of guide target sites, and (4) the distribution of guide counts in the initial set of cells (i.e. the frequency of each guide).
The probability of a cell being sorted into each of the pools depends on whether the guide in that cell targets a regulatory sequence. We refer to the difference in sorting probabilities between guides that do/do not target regulatory sequences as the "strength of selection". The sorting of guides into the different pools is simulated by sampling from a Dirichlet-Multinomial distribution. We also model both The sorting vector for the remaining − cells with 'ineffective' guides is simply .
Each simulation starts with a guide library with a distribution of guide counts. We assume the guide counts follow a zero-inflated negative binomial distribution (ZINB), ~( , , ), where is the mean, the dispersion and the proportion of the distribution that comes from the zero point mass.
The parameters of these distributions can be specified or estimated by maximum likelihood from a provided input pool.
The pool used for sorting contains millions of cells, which we sample from to form the input pool using a multinomial sampling step. Each guide is selected for the input pool with a probability = ∑ , where is the number of cells containing guide i. We simulate the sorting of cells into different pools by sampling counts for each guide from a Dirichlet-Multinomial (DM) distribution which allows the sorting probabilities and dispersion (variability in sorting) to be specified. If a guide targets a regulatory element, the Dirichlet parameters are shifted by the vector such that a greater proportion of cells containing the guide are sorted into the low expression and the medium expression pools (for details see above). Lastly, we simulate the sequencing step by drawing from a multivariate hypergeometric distribution (sampling without replacement) or a multinomial distribution (sampling with replacement) that reflects the counts of guides in the sorted pools. Sampling without replacement is used to simulate the use of unique molecular identifiers that allow duplicate reads to be filtered.

Performance assessment in simulations
We compared the performance of several analysis methods for CRISPR regulatory screens.
Additionally, we compared different ways to combine per-guide scores to compute scores for genome regions. Both sliding windows 21,22 as well as a modified robust ranking algorithm (α-RRA) 5,24 have previously been used to aggregate guide scores within a specific region. Briefly, a sliding window approach combines the scores of all guides that fall within a window. The window is then shifted by one guide and the scores are computed for the next window. This approach is simple, however, the size of the window to use is typically not clear especially given the variable spacing of target sites. The α-RRA approach ranks all guide scores, places guides into bins and tests bins for an enrichment of low ranks.
This approach is powerful and p-values can be computed using permutations. However, it is not clear how large the bins should be and we have found this approach to be sensitive to outliers.
We implemented both methods in addition to the genome score calculation described above. Briefly, in the genome score method, we assign base pair positions scores by combining scores from guides with overlapping areas of effect ( Supplementary Fig. 1). The advantage of combining scores across regions of overlapping effect is that there is no need to set arbitrary bin sizes-instead the range is defined by the CRISPR system used. An additional improvement could be to describe the area of effect with a gaussian distribution instead of assuming a uniform effect across a region, as has been implemented in CRISPR-surf 23 . We compared all three methods for combining guide scores and found that the bestperforming method was RELICS when the per-guide scores are combined across their region of effect ( Supplementary Fig. 7). Note that the combination of α-RRA and DESeq2 is equivalent to the MAGECK method which is commonly used to analyze screens of protein coding genes 24 .
We assessed the performance of different analysis methods in simulations by calculating the area under the precision-recall curve (prAUC).

RELICS analysis model
RELICS is open source and available on GitHub (https://github.com/patfiaux/RELICS). RELICS represents the observed guide counts as a column vector of length = , where is the number of sgRNAs (or sgRNA pairs) and is the number of pools. The observations are assumed to be from a Negative Binomial random variable that can be described by a generalized linear mixed model (GLMM) 19 with the following form:

Y~NB( , )
Here is the expected number of guide counts and is a dispersion parameter. We use a Negative Binomial distribution rather than a Poisson distribution to describe the guide counts, because, like most next-generation sequencing data, we have found that the guide counts are over-dispersed 17,18,32 .
We number the pools so that the first pool (pool 1) is the 'reference' pool. The reference pool will typically be the input pool, which contains cells that have not been subjected to sorting or selection. If an experiment does not provide data from an input pool, one of the pools is arbitrarily chosen to be the reference. The random effect term describes the per-guide deviation from the mean rate defined by the fixed effects. This deviation is necessary because the frequency of each guide in the input library is highly variable. We use a random effect rather than fixed effects to describe the per-guide deviation to keep the number of parameters in the model small and avoid overfitting. The random effects matrix, , has dimension × with elements that indicate the guide the observations are from: The random effect for guide is assumed to be normally distributed with variance 2 : We jointly estimate the parameters of the RELICS model numerically by maximum likelihood, using as data observed counts for guides which have been labeled as regulatory (i.e. positive control guides) or non-regulatory. The set of non-regulatory guides can be the complete set of non-positive control guides or a set of negative control guides. In total there are 2 + 1 free parameters in the model = ( , , ).
We perform bootstrap parameter estimation, estimating the parameters from a random sample of 1/3 of the guides. We perform 10 bootstrap iterations and use the median parameter estimates from these iterations as our final parameter estimates. We fit RELICS parameters using the glmmTMB R package 33 which is fast due to its use of a Laplace approximation for evaluating integrals and template metaprogramming. Fitting of parameters for a screen with 2 replicates containing 4 pools each and ~8700 guides takes roughly 1000 seconds on a laptop with an intel i7 core.
Once RELICS' parameters have been estimated, we can use them to calculate the likelihood of This score can be considered a log Bayes Factor, where has a normal prior distribution with an empirically-estimated hyperparameter, 2 . Note that we do not integrate over the or parameters, which is equivalent to assuming that they have point mass prior probabilities with no uncertainty. This assumption is reasonable since these parameters are typically estimated from a large number of guides, and it greatly reduces the required computation time. As defined above, the RELICS scores assume that the prior probabilities of a guide targeting a regulatory or non-regulatory sequence are equal (Pr( = 1) = Pr( = 0) = 0.5). For most screens, this is unlikely to be true, but the scores can be adjusted according to one's prior belief by adding a log prior ratio. For example, if 5% of guides are believed to target a regulatory sequence the scores can be adjusted as follows: ′ = + log( 0.05 0.95 ). In practice, we assume that scores > 5 provide reasonably strong evidence that a guide targets a regulatory sequence and that scores > 10 provide very strong evidence. This is further corroborated by our FDR calculations described above (Supplementary Fig. 7).

H3K27ac data
Jurkat H3K27ac data from Mansour et al. 34 was downloaded from GEO (GSM1296384). The reads were aligned with bwa-mem 35 using default parameters and filtered for duplicates and low mapping quality (MAPQ < 30) using samtools 36 . The reads were then converted to reads per kilobase per million in 200bp bins using deepTools2 37 .
ENCODE H3K27ac ChIP-seq data for the K562 cell line was downloaded in bedgraph format from the UCSC genome browser 38 .

Data availability
All data were obtained from published papers or simulated as described above. The formatted data that we used for analyses can be downloaded from: https://figshare.com/projects/RELICS_data/65303

Supplementary Figures
Supplementary Figure 1 | RELICS genome region scores. RELICS computes scores for individual guides, and combines scores from overlapping guides to compute a genome region score. Guides are considered to overlap when the 'areas of effect' (blue dashed lines) surrounding their target sites overlap. In this example, the score for genomic region 2 is the sum of the scores from the first and second guides, which both overlap this region.