PFRED: A computational platform for siRNA and antisense oligonucleotides design

PFRED a software application for the design, analysis, and visualization of antisense oligonucleotides and siRNA is described. The software provides an intuitive user-interface for scientists to design a library of siRNA or antisense oligonucleotides that target a specific gene of interest. Moreover, the tool facilitates the incorporation of various design criteria that have been shown to be important for stability and potency. PFRED has been made available as an open-source project so the code can be easily modified to address the future needs of the oligonucleotide research community. A compiled version is available for downloading at https://github.com/pfred/pfred-gui/releases/tag/v1.0 as a java Jar file. The source code and the links for downloading the precompiled version can be found at https://github.com/pfred.


Introduction
Over the last three decades small interfering ribonucleic acid (siRNA) and antisense oligonucelotides (ASO) have emerged as powerful tools to modulate the expression of genes in vivo and in vitro. Potential oligonucleotide therapeutics have been discovered and developed with mixed success [1,2]. For example, Mipomersen, Inotersen, Patisiran and Givlaari have all been approved for use in recent years. Mipomersen a lipid lowering agent that targets the ApoB gene [3,4] was eventually approved by the United States Food and Drug Administration (FDA) in 2013 and later discontinued due to adverse effects [5]. Inotersen (antisense) and Patisirian (first ever siRNA approved by FDA) are both targeting transthyretin (TTR) for the treatment of polyneuropathy caused by hereditary transthyretin-mediated amyloidosis [6,7], while Givlaari (world's first-ever approved GalNAc-conjugate RNAi therapeutic) targets aminolevulinic acid synthase 1 (ALAS1) for the treatment of acute hepatic porphyria [8]. Furthermore, the approval of Nusinersen for the treatment of spinal muscular atrophy (SMA) in pediatric a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and adult patients represents a historical achievement for the SMA community and a success story for the field of nucleic acids based therapeutics [9].
However, the development of oligonucleotide therapeutics remains challenging as they often have unintended off-target effects and cause non-specific hepatic and renal toxicity [10][11][12]. Additionally, it is exceptionally difficult to deliver oligonucleotides to most tissue types and organs. This can result in the need for tissue specific delivery systems and several siRNA compounds have advanced into development using this paradigm; these include liposomally liver targeted ALN-RSV01 which targets RSV through the nucleocapsid "N" gene [13], PF-4523655 for age related macular degeneration targeting the eye through direct injection [14] and galNac conjugates [15] such as the late stage ALN-AT3 [16] targeting antithrombin for the treatment of hemophilia. The recent FDA approval of Patisiran [17] represents a first-of-itskind RNA interference (RNAi) therapeutic for the treatment of the polyneuropathy of hereditary transthyretin-mediated (hATTR) amyloidosis in adults and the first ever FDA approval of a siRNA treatment [7].
Designing antisense oligonucleotides (ASOs) and siRNA can be logistically challenging given the many competing design criteria that can be incorporated into the selection of tool or therapeutic sequences. Design considerations may include splice variants, cross-species targeting for validation, single nucleotide polymorphisms (SNPs), secondary structure, undesirable motifs (e.g. toxic, poly-A, or poly-G repeats), complementarity with off-target sequences, intron/exon boundaries, chemical modification pattern of nucleotides, and predicted activity. These factors need to be weighed according to the intended use of the oligonucleotide. For example, compounds designed as in vitro tools only need to be active in a single species, whereas it is advantageous for therapeutic oligonucleotides to be active in model organisms as well as patients. A number of computational tools have been developed to address different aspects of the design process, including OptiRNAi [18], siExplorer [19], DSIR [20], i-Score [20,21], MysiRNA-Designer [22], siDirect [23,24], siRNA Selection Server [25], siRNArules [26] and siRNA-Finder [27]. Additionally, major vendors of siRNA reagents such as Dharmacon (siDESIGN Center) [28], GeneScript (Target Finder) [29] and Thermo Fisher (BLOCK-iT) [30] have web based design tools available.
In the present paper we describe PFRED (PFizer RNAi Enumeration and Design tool), a client-server software system designed to assist with the entire oligonucleotide design process, starting with the specification of a target gene (Ensembl ID) and culminating in the design of siRNAs or RNase H-dependent antisense oligonucleotides. Sequences are chosen using bioinformatics algorithms built upon careful mining of the sequence-activity relationships found in public datasets as well as internal collections. The tool provides researchers with a user-friendly interface where the only required input is an accession number for the target gene and it returns a list of properties that are believed to contribute to the efficacy of an siRNA or ASO. These properties include human transcripts and cross-species homology, GC content, SNPs, intron-exon boundary, duplex thermodynamics, efficacy prediction score and off-target matches. An automated oligonucleotide selection procedure is available to quickly select one potential set of sequences with an appropriate property profile. The selection protocol can be customized by the user through changes of the selection cutoffs or the addition of alternate design parameters and algorithms.

Materials and methods
PFRED is implemented using a client-server software architecture. On the client-side, the graphical user interface (GUI) was developed in the Java programming language and is deployed to a user's desktop through Java Web Start, the user will need to have a recent version of the Java development kit setup in their system. Because of the cross-platform support of Java, PFRED can be run from multiple desktop environments such as Windows, Mac and Linux. The most CPU-intensive algorithms used in PFRED are implemented as Python and Perl scripts which are wrapped within a docker container that works as a web service hosted in an Amazon Web Service (AWS) instance and invoked remotely from the client through a RESTful API, also developed in Java. Beyond the performance improvements, the client-server architecture also greatly simplifies the integration of disparate design algorithms that often rely on heterogeneous third-party libraries or reference datasets. Calls to the server-side algorithms are made through the PFRED java web service which uses the REST API that was developed using the Glassfish Jersey Java modules. New algorithms can be added to PFRED following the conventions established for the existing tool.
To facilitate the use of PFRED, users only need to download the latest GUI release and run it on their own computer; it is also possible for users to build their own private PFRED service by downloading the PFRED Docker repository in order to create and run the PFRED Service container following the documentation given at https://github.com/pfred/pfred-docker. This technique named containerization, facilitates cloud and server agnosticism, because the container automatically builds the needed environment for the PFRED back-end to work; users can also use this local service to test their own alternative algorithms or models, and deploy the PFRED service in a cloud environment like AWS, Azure, Google cloud, or on a local server. It is only required for the user to have Docker and Docker-compose setup in the system (see https://docs.docker.com/compose/). The current client-server architecture consists on an already built-in Docker Service within a publicly queryable AWS instance managed by Biogen, which offers a REST API for easy PFRED end-point access to all the back-end services (see https://github.com/pfred/pfred-rest-service for documentation on how to compile the REST source code and access the PFRED end-points using a browser). The PFRED GUI takes advantage of the RESTful API in order to access all the needed end-points for ASO and siRNA design (Fig 1 illustrates the PFRED high-level architecture). The ENSEMBL database is used to query all genomic information requested by the backend modules. In contrast, the Docker Volume within AWS is used as both, data warehouse and local storage of the backend modules. Therefore, the data transferred to the Docker volume varies with the outputs given by the modules, whereas the ENSEMBL database serves as an external static warehouse (PFRED uses the latest versioning of ENSEMBL). The PFRED GUI serves as the front end that manages the calls between modules and the type of data requested by the user for design (Fig 2). To help users sift through large numbers of sequences, the main table supports filtering by either sequence patterns or property values. For example, a user can search for oligonucleotides that are conserved between human and mouse and have high predicted siRNA efficacy. Oligonucleotides can also be organized into different groups and then reviewed separately. Additional functions are also included to help users analyze their data. For example, the mathematical formula feature can be used to apply weights to multiple properties and summarize them into a single score for ranking and selection. The data can be exported as a comma separated file (CSV) and then imported into other applications for further analysis.
A key element of PFRED's functionality is its canonical representation of oligonucleotide structures. In a separate paper [31] we presented a monomer based notation language to represent complex biomolecule structures (the Hierarchical Editing Language for Macromolecules or HELM) and a desktop application named PME (Pfizer Macromolecule Editor) for their drawing and visualization, which has become an industry standard for biopolymer notations [32]. Fig 4A shows an example of HELM notation for a short antisense gapmer. As polymeric biomolecules, antisense oligonucleotides and siRNAs can be represented with HELM notation and different sequence visualization options can be created for specific purposes. Included within PFRED are three oligonucleotide representations, shown in Fig 4B-4D. Fig  4B shows Fig 4C, using a standard notation with RNA (capital) and DNA (lowercase) but emphasizing "chemical modifications" through the use of the pink dots. For example, the sequence shown in Fig 4C has a fully phosphorothioated backbone which is denoted by the pink dots between the base letters. It also has three locked nucleic acids (LNAs) at the 3' and 5' ends of the compound that are distinguished by the pink dots above the letters denoting these nucleosides. A final block representation is shown as Fig 4D. The block representation was created to emphasize patterns of chemical modifications to a structure. Base modifications are shown as pink dots below the blocks, sugars are colored depending on their structure (e.g. LNA-yellow, 2'OMe-purple, etc.) and backbones other than phosphodiesters are represented by pink connectors above the blocks.

an explicit component or monomer view of the oligonucleotide breaking it down into sugars, bases and linkers. A sequence view is shown in
The PFRED enumeration workflow is shown in Fig 5. Transcript IDs and sequences of a target for different ortholog species (human, rat, mouse, dog, chimpanzee and macaque) are

PLOS ONE
retrieved from the public ENSEMBL database through the ENSEMBL REST API (please refer to https://rest.ensembl.org/) given a single transcript or gene ID. Users define a primary transcript sequence and their length of choice (k) for all the possible ASO and siRNA k-mers to be enumerated (Fig 5, steps 1-3). Each enumerated oligonucleotide is then aligned to the other selected transcripts including splice variants and ortholog transcripts across different species using the fast memory-efficient short read aligner, Bowtie, against the pre-calculated Bowtie index covering all transcript sequences [33]. Alignment results are processed by a Python interface and the number of mismatches to each transcript is reported back to PFRED as properties of the oligonucleotides. Exon location and SNP information are also retrieved through the EnsEMBL APIs for each base position and mapped to each oligonucleotide as properties.
Pre-built bowtie indexes for all species are required for off-target searches. These were built by creating a Python interface that automatically downloads all transcripts (cDNA) and unspliced gene sequences for any species in FASTA format using the ENSEMBL REST API which provides the advantage of always relying on the latest version of the ENSEMBL database. The FASTA files are written with transcript ID and Gene ID included in the sequence names. These FASTA files are used to build indexes for searching cDNA and gene off-target matches. The ENSEMBL IDs are used for post-processing the bowtie alignment results to calculate the number of genes that an oligonucleotide aligns to with 0, 1 and 2 mismatches, and to exclude the intended target from the off-target count. For siRNA, only the cDNA database is searched for both the guide and passenger strands. In contrast, for antisense design, only the sense sequences (or antisense sequence in reverse complementary direction) is searched against both the cDNA and unspliced gene sequence indexes for off-target hits. The number of cDNA off-target hits is then reported as an oligonucleotide property for 0, 1 and 2 mismatches. While the off-target search module is not based on any predictive model or experimental data, it can be used effectively as an annotation tool to help the user prioritizing and filtering sequences through the PFRED screening funnel.

PLOS ONE
Due to the cost of synthesizing and testing siRNAs and ASOs, rational design represents a critical step in any oligonucleotide experiment. A priori prediction of a compound's function is a key component of RNAi and ASO applications to obtain effective silencing of the desired  Table" stores all the data generated through the oligo enumeration workflow (oligo sequences with their properties) and provides access to common spreadsheet functions. It is possible to assign selected oligo sequences to specific groups which can be shown and accessed under "Groups" on the right-side panel. More detailed information about the oligo sequence notation and the primary mRNA transcript can be visualized at the bottom of the user interface by enabling the "Oligo Detail" and "Target Transcript" panels. https://doi.org/10.1371/journal.pone.0238753.g003

PLOS ONE
PFRED: A computational platform for siRNA and antisense oligonucleotides design gene. Early models were based on empirical rules for designing functional siRNA, obtained from the experimental results of ASO and RNAi screening campaigns. For RNAi sequence design, Tuschl [34,35] first and later Reynolds [36] reported a series of guidelines for the implementation of a "point-based" scoring scheme where the presence or absence of key features in an siRNA duplex will lead to an increase or decrease of the final functional score. Several other algorithms have emerged since then [19,20,[37][38][39][40][41][42] based on more sophisticated scoring schemes which use sequence descriptors and supervised machine learning techniques to select siRNAs capable of inducing effective gene silencing in cell lines. In PFRED, an algorithm for predicting siRNA functionality has been implemented by using diverse sets of oligonucleotide descriptors combined with a support vector machine (SVM) algorithm. Details of the sequence descriptors and algorithm implementation were previously described elsewhere [43]. Models of ASO activity have also improved but can be highly dependent on the length of the ASO as well as its chemical modification. Most published models use either sequence motif based filters [44] or computational approaches including secondary structure prediction [45][46][47], thermodynamics [48,49], machine learning [50][51][52] and molecular dynamics simulation [53], but all have reported only limited success. Here we integrated both the thermodynamics parameters calculated using Oligowalk [54] and the sequence motifs reported in Matveeva [44]. Using these parameters, a scoring scheme was derived that penalized motifs correlated with toxicity or promiscuity, strong inter-and intra-molecular binding energy as well as weak binders to the RNA target [10,43]. The model was derived and tested using data from AOBase [55] including more than 500 ASOs against 46 targets. The thermodynamic calculations were based on unmodified DNA antisense oligonucleotides and therefore of limited relevance to modifications that significantly increase the melting temperature (Tm) of DNA/RNA duplex such as LNA.
Once all the descriptors have been calculated for each potential ASO or siRNA the design workflow becomes a process of selection. An example of the PFRED selection interface for siRNA design is shown in Fig 5, step 4. Oligonucleotide selections are made by first filtering out those compounds which align with: 1) known human SNPs, 2) sequence motifs associated with non-specific binding (such as polyG, polyT) and 3) low complexity sequences that have a high number of perfect off-target matches. An additional filter for ortholog transcript matches can be added depending on the experimental design requirement. Also, if predicted activities are available, they may be included as a selection criterion (models may depend on a specific oligonucleotide length or modification type). Other oligonucleotide properties such as number of off-target matches and diversity of location of each oligonucleotide along the gene sequence can be used to finalize the selection of desired compounds for a given experiment. The final selected compounds can then be exported in text format in preparation for synthesis or further analysis outside of the tool.

Conclusions
We present here an informatics platform for the design of antisense and siRNA oligonucleotides. The system includes a basic set of design algorithms but is extensible such that experienced developers can add novel or proprietary descriptors. Workflows are included which walk researchers through the process of identifying a target sequence and choosing the design criteria to be applied in selection of candidate molecules. A series of oligonucleotide visualization and plotting algorithms are included along with spreadsheet functions allowing data import/export, formulas, and conditional formatting. Links to the source code and hosted versions can be found at https://github.com/pfred.