Small interfering RNAs (siRNAs) have become a ubiquitous experimental tool for down-regulating mRNAs. Unfortunately, off-target effects are a significant source of false positives in siRNA experiments and an effective control for them has not previously been identified. We introduce two methods of mismatched siRNA design for negative controls based on changing bases in the middle of the siRNA to their complement bases. To test these controls, a test set of 20 highly active siRNAs (10 true positives and 10 false positives) was identified from a genome-wide screen performed in a cell-line expressing a simple, constitutively expressed luciferase reporter. Three controls were then synthesized for each of these 20 siRNAs, the first two using the proposed mismatch design methods and the third being a simple random permutation of the sequence (scrambled siRNA). When tested in the original assay, the scrambled siRNAs showed significantly reduced activity in comparison to the original siRNAs, regardless of whether they had been identified as true or false positives, indicating that they have little utility as experimental controls. In contrast, one of the proposed mismatch design methods, dubbed C911 because bases 9 through 11 of the siRNA are replaced with their complement, was able to completely distinguish between the two groups. False positives due to off-target effects maintained most of their activity when the C911 mismatch control was tested, whereas true positives whose phenotype was due to on-target effects lost most or all of their activity when the C911 mismatch was tested. The ability of control siRNAs to distinguish between true and false positives, if widely adopted, could reduce erroneous results being reported in the literature and save research dollars spent on expensive follow-up experiments.
Citation: Buehler E, Chen Y-C, Martin S (2012) C911: A Bench-Level Control for Sequence Specific siRNA Off-Target Effects. PLoS ONE 7(12): e51942. https://doi.org/10.1371/journal.pone.0051942
Editor: Szabolcs Semsey, Niels Bohr Institute, Denmark
Received: September 28, 2012; Accepted: November 9, 2012; Published: December 14, 2012
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Funding: This research was supported by the Intramural Research Program of the National Institutes of Health, National Center for Advancing Translational Sciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Initially a bench-level technique for targeting single genes for down-regulation, siRNAs have grown into a major source of high-throughput data with functional screens that attempt to access the involvement of the entire transcriptome in a particular biological process using tens of thousands of siRNAs . Low validation rates and the lack of overlap between genes identified in different screens targeting the same pathway  has led to a increased understanding of the prevalence and mechanisms of siRNA off-target effects . Recent research has leveraged analysis of seed sequences in siRNA screens to identify likely false positives due to off-target effects  and infer transcripts responsible for off-target phenotypes , , but these methods rely on the statistical analysis of large sets of data and are not applicable to smaller screens and bench-level experiments using a small number of siRNAs.
From the beginning of siRNA use as an experimental method, concern has existed about false positives due to lack of specificity , . Although it has been previously noted that scrambled siRNAs are probably a sub-optimal control, a validated alternative has not been available. Standard non-silencing controls can be used to control for general effects common to transfection with any siRNA, but they cannot control for off-target effects specific to a given siRNA, which are determined by the seed sequence (bases 2–8 at the 5′ end of the siRNA strand loaded into RISC)  and will thus vary from siRNA to siRNA.
To find a suitable control for individual siRNAs, a modification is required that will eliminate on-target effects while retaining the same off-target effects. We propose that this can be accomplished by maintaining guide and passenger strand seed sequences of the siRNA (bases 2–8 and bases 12–17 respectively) and each of their respective efficiencies loading into the RISC complex, which is probably determined in part by the GC-asymmetry between the terminal bases on either end of the siRNA (bases 1–3 and 16–19) . We test two mismatch designs that meet these requirements: C10, which is the same siRNA except that base 10 is the complement of the original siRNA, and C911, which is the same siRNA except that bases 9 through 11 are the complement of the original siRNA (Figure 1). Previous work has determined that mismatches at base 10 of an siRNA could effectively differentiate between mRNAs that differ by a single base .
An siRNA (A, left panel) consisting of two complementary 19-mers of RNA (with two-base overhangs) is divided here conceptually into the 5′ end of the anti-sense strand (teal) the middle of the siRNA (black) and the 3′ end of the anti-sense strand (red). siRNAs are designed to be the reverse-completment of the mRNA sequence they are targeted to down-regulate (A, middle panel), but matches of the seed sequence of an siRNA to the 3′UTR of other mRNAs can result in their off-target down-regulation as well (A, right panel). A scrambled siRNA (B) eliminates the match to the target mRNA and thus will not down-regulate it, but also eliminates the off-target effects due to matches to the seed sequence (while, perhaps, creating new off-target effects against the new seed sequence). The C911 mismatch control (C) reduces or eliminates the down-regulation of the targeted mRNA by taking the complement of the middle three bases (green), but maintains the off-target effects of the original siRNA by keeping anti-sense and sense strand seed sequences intact. In this manner, comparison of effects elicited by the original siRNA and the C911 mismatch control should allow us to distinguish phenotypes that are due to down-regulation of the intended target rather than off-target effects.
It is worth noting that the idea of using a centrally mismatched siRNA as an experimental control is not a new one. Bryan Cullen proposed in 2006  that “one could also test an siRNA or an shRNA mutant bearing a central ≥3-nt mismatch to the target mRNA that leaves the seed region unchanged”. However, our review of the literature does not reveal any systematic test of this idea that would allow researchers to employ mismatch siRNAs as a control with confidence that it will allow them to distinguish between true and false positives.
Materials and Methods
Testing the efficacy of the C10 and C911 mismatch control designs required us to identify a gold standard set of true and false positive siRNAs. To find siRNAs which have a significant inhibitory effect on a constitutively expressed reporter luciferase, a previously performed whole genome screen, briefly described below, was analyzed.
HEK293 cells harboring CMV-driven firefly luciferase were obtained from Promega and cultured in DMEM, 10% FBS. Cells were passaged every 3 days. Screening was conducted using the Ambion Silencer Select Human Genome siRNA Library Version 4. This collection targets ≈21,500 genes, with the vast majority of genes targeted by 3 independent, non-pooled siRNAs. For screening, siRNA reagent (0.8 pmol) was spotted into white solid bottom 384-well plates (Corning 3570) using a VPrep liquid handler (Velocity11, Agilent Technologies) integrated into a BioCel robotic platform (Agilent Technologies). All screening plates had a full column (16 wells) of both negative (Ambion SilencerSelect Negative Control #2) and positive control (PLK1 Ambion SilencerSelect siRNA, cat# s448). Positive control served to assess transfection efficiency and assay performance, whereas the median value of each plate’s negative control column was used as a method to normalize corresponding sample wells. Lipofectamine RNAiMax (Invitrogen, 0.07 µL) was added to plate wells in 20 µL of serum free media (DMEM) using a WellMate dispenser (Thermo Scientific). Transfection reagent and siRNA were complexed for 45 minutes at ambient temperature before adding cells (1000) in 20 µL of media containing 20% serum (WellMate, Thermo Scientific). This yielded final transfection mixtures comprising 20 nM siRNA in media containing 10% serum (standard for the growth of HEK293 cells). The cells were then cultured for 72 hours at 37°C in 5% CO2 prior to addition of OneGlo luciferase assay reagent (Promega). Luminescence was measured using an EnVision Multilabel Plate Reader (PerkinElmer).
It should be noted that the median assay response (81.8%) in the screen was well below the negative control (100%, by definition) used to normalize the screen results and subsequent assay results. This may be due to off-target effects of the negative control. Alternatively, the median assay response could be lower than the negative control because there are many more mRNAs that when down-regulated would interfere with transcription as compared to mRNAs that when down-regulated would promote or increase transcription of the reporter. To aid in interpretation of the results, the median assay response from the screen is indicated on the graphs of experimental results.
From the siRNAs in this screen that significantly down-regulated the luciferase reporter (more than 2-fold inhibition), a gold standard set of ten true positives and ten false positives was selected (Figure 2, Table 1, and Data S1). True positives were selected based on their having a reasonable biological connection to transcription, multiple siRNAs designed against the same mRNA having the same approximate phenotype, and little or no evidence that the phenotype was due to seed-based off-target effects (based on the Common Seed Analysis  plot). Conversely, false positive siRNAs were selected based on the targeted gene having no known function in transcription, little or no activity in the assay from other siRNAs designed against the same mRNA, and clear evidence of seed-based off-target effects. False positives were further restricted to have unique seed sequences to ensure diversity of the gold standard set.
Common seed analysis plots for two genes which had an siRNA showing significant knockdown (greater than 2-fold decrease in luciferase signal in comparison to non-silencing control). The dashed line represents the median response for the whole genome library. (A) POLR2A was selected as a true positive because multiple siRNAs generated the same phenotype, the role of RNA polymerase in transcription is already established, and siRNAs containing the same seed sequences (grey circles) did not show a trend towards inhibition of the reporter. (B) In contrast, PLXDC1 was selected as a false positive because it has no known role in transcription, other siRNAs against the same gene fail to elicit the same phenotype, and there is an obvious trend of down-regulating the reporter for all other siRNAs containing the same hexamer (GAGTAG) or heptamer seed sequence.
To test the ability of different control siRNAs to distinguish between the selected true positives and false positives, four siRNAs were synthesized for each of the original 20 siRNAs: an unmodified siRNA with the same sequence as the originally identified duplex and with no chemical modifications, a scrambled siRNA with the same base frequency as the original siRNA in a different order (generated by an online tool for scrambled siRNAs, http://www.sirnawizard.com/scrambled.php), the siRNA with base 10 complemented (C10), and the siRNA with bases 9 through 11 complemented (C911). Each of these eighty siRNAs was then tested in the assay for luciferase activity used in the original genome-wide screen, with three replicates per siRNA. Results were normalized to percent of activity observed with a commercially available non-targeting control (Ambion Silencer Select Negative Control 2, Life Technologies™).
The statistical significance of the difference between each of three controls and the original siRNA was calculated using Student’s t-test and p-values of less than or equal to 0.05 were designated as being statistically significant. All statistical analysis was performed in R , and figures were generated using ggplot2  and Inkscape (http://inkscape.org/).
As expected, scrambling the siRNA sequences (Figure 3A) eliminated most or all of their activity in the assay, regardless of whether they were true or false positives. In contrast, the C911 mismatch siRNAs (Figure 3B) showed a large reduction in activity for all of the true positive siRNAs and for none of the false positive siRNAs, demonstrating that the C911 mismatch control effectively reduces on-target effects while maintaining off-target effects, allowing experimental discrimination between true and false positives. For two of the C911 false positives, there was a small but statistically significant difference between the mismatch control and the original siRNA, indicating that the maintenance of the off-target effect by the C911 mismatch may not always be perfect. The C10 mismatch design (Figure 3C) also showed ability to distinguish between true and false positives, although the amount of on-target activity reduction was significantly less than in C911 for several siRNAs (SON, POLR2A, PCF11, POLR2I). As with C911, two of the C10 mismatch siRNAs against putative false positives showed small but statistically significant differences in activity from the original siRNAs.
siRNAs which downregulate a luciferase reporter more than two-fold (teal, panels A,B, and C) were selected and categorized as either true or false positives based on biological annotation, the activity of other siRNAs against the same gene, and observed activity of other siRNAs with the same seed sequence. Three types of negative control were then tested for their ability to distinguish true positives from false positives. As expected, scrambled versions of the siRNAs (red, panel A) reduced or eliminated their activity, regardless of whether it was on (true positive) or off-target (false positive). In contrast, the C911 mismatch design (green, panel B) showed a large reduction in activity for only true positive siRNAs, indicating that changing bases 9–11 of the siRNAs to their complement successfully disrupted on-target activity while maintaining off-target activity. The C10 mismatch design (yellow, panel C) also maintained off-target effects for false positives, but in some cases failed to reduce the on-target phenotype observed in the true positive group as drastically as the C911 mismatch. Statistically significant p-values for the t-test are marked by *** (p-value < = .001), ** (p-value < = .01), and *(p-value < = .05). The dashed line represents the median response for the whole genome library. Error bars represent the calculated standard error (standard deviation divided by the square root of the number of observations).
It is possible that there are sequence specific off-target effects that are mediated in part by bases 9 through 11 of the siRNA, although we do not see evidence for that in these experiments. If this were the case, the C911 modification might significantly diminish off-target effects for an siRNA and lead to a false positive. Likewise, it is possible that there are siRNAs for which the on-target effect will not be significantly diminished by the C911 modification, which could lead to a false negative. Since the C911 modification worked for 20 of 20 siRNAs in these experiments, we can infer that the failure rate will be small but we cannot guarantee it will be 0.
Although the siRNAs for this study were chosen to fall into one of two categories (phenotype due to on-target effect or phenotype due to off-target effect), it is possible, even likely, that an siRNA’s observed activity could be a combination of on-target and off-target effects. In this case, we would expect that the C911 control would continue to show activity in the assay, but perhaps less activity than the original siRNA which also had on-target effects. For this reason the correct comparison is to ask if there is a reproducible, statistically significant difference between the C911 control and the unmodified siRNA, not if the C911 control has or does not have activity in the assay.
Finally, it is important to note that failing to disprove the null hypothesis that a gene is not involved in the pathway of interest is not the same as proving the null hypothesis. As has been the case previously, an siRNA against a biologically relevant mRNA may fail to elicit a phenotype for a number of reasons, including knockdown that is insufficient to elicit a phenotype distinguishable from noise in the assay. These caveats will continue to apply when using the C911 control.
In light of the promising initial results observed for our test set, we believe researchers should move away from using scrambled/non-targeting controls for individual siRNA experiments, which we have shown experimentally have no utility in distinguishing between true and false positives and will only lend undeserved confidence to results that may fail to confirm in more time consuming rescue experiments. Instead, the use of the C911 mismatch control has shown excellent ability to detect false positives and can easily be incorporated into experimental designs. Eventually, siRNA vendors could offer siRNAs and their C911 mismatch control for sale together, saving investigators the time and expense of ordering these controls separately. It would also be possible to synthesize entire libraries with their mismatch controls included, which could allow for more accurate siRNA screening, especially for small libraries that are not amenable to larger scale statistical analyses.
Experimental Data. Excel spreadsheet containing sheets for the selected siRNAs, the synthesized sequences, and the experimental results plotted in Figure 3.
Common Seed Analysis (CSA) Plots for Gold Standard Selections. This supplemental file contains twenty Common Seed Analysis (CSA) plots, one per page. Each CSA plot is for one of 20 siRNAs chosen as a true or false positive, along with the other siRNAs intended to target the same gene. The dashed line represents the median response for the whole genome library. The y-axis is percent luciferase activity compared to negative control. Each siRNA tested against the gene of interest is plotted in its own column as a red triangle. In the same column, siRNAs tested against different genes/mRNAs that had the same heptamer seed sequence (bases 2–8, large grey circles) or hexamer seed sequence (bases 2–7, small grey circles) are plotted. When all siRNAs with the same seed sequence have roughly the same phenotypic effect as the siRNA of interest, we can conclude that the phenotype is likely due to seed-based off-targeting and is not specific to the intended target.
The authors wish to thank Chris Austin and Marc Ferrer for their support and feedback.
Conceived and designed the experiments: EB. Performed the experiments: YC. Analyzed the data: EB. Contributed reagents/materials/analysis tools: SM. Wrote the paper: EB SM.
- 1. Mohr S, Bakal C, Perrimon N (2010) Genomic Screening with RNAi: Results and Challenges. Annual Review of Biochemistry 79: 37–64.
- 2. Bushman FD, Malani N, Fernandes J, D'Orso I, Cagney G, et al.. (2009) Host cell factors in HIV replication: meta-analysis of genome-wide studies. PLoS Pathog 5.
- 3. Sigoillot FD, King RW (2011) Vigilance and validation: Keys to success in RNAi screening. ACS chemical biology 6: 47–60.
- 4. Marine S, Bahl A, Ferrer M, Buehler E (2012) Common seed analysis to identify off-target effects in siRNA screens. Journal of biomolecular screening 17: 370–378.
- 5. Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, et al. (2012) A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nature methods 9: 363–366.
- 6. Buehler E, Khan AA, Marine S, Rajaram M, Bahl A, et al. (2012) siRNA off-target effects in genome-wide screens identify signaling pathway members. Scientific reports 2: 428.
- 7. Editorial (2003) Whither RNAi? Nature cell biology 5: 489–490.
- 8. Echeverri CJ, Beachy PA, Baum B, Boutros M, Buchholz F, et al. (2006) Minimizing the risk of reporting false positives in large-scale RNAi screens. Nature methods 3: 777–779.
- 9. Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, et al. (2006) 3′ UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3: 199–204.
- 10. Amarzguioui M, Prydz H (2004) An algorithm for selection of functional siRNA sequences. Biochemical and biophysical research communications 316: 1050–1058.
- 11. Schwarz DS, Ding H, Kennington L, Moore JT, Schelter J, et al. (2006) Designing siRNA that distinguish between genes that differ by a single nucleotide. PLoS genetics 2: e140.
- 12. Cullen BR (2006) Enhancing and confirming the specificity of RNAi experiments. Nature methods 3: 677–681.
- 13. Team RDC (2008) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
- 14. Wickham H (2009) ggplot2: elegant graphics for data analysis: Springer New York.