IRIS-DGE: An integrated RNA-seq data analysis and interpretation system for differential gene expression

Motivation Next-Generation Sequencing has made available much more large-scale genomic and transcriptomic data. Studies with RNA-sequencing (RNA-seq) data typically involve generation of gene expression profiles that can be further analyzed, many times involving differential gene expression (DGE). This process enables comparison across samples of two or more factor levels. A recurring issue with DGE analyses is the complicated nature of the comparisons to be made, in which a variety of factor combinations, pairwise comparisons, and main or blocked main effects need to be tested. Results Here we present a tool called IRIS-DGE, which is a server-based DGE analysis tool developed using Shiny. It provides a straightforward, user-friendly platform for performing comprehensive DGE analysis, and crucial analyses that help design hypotheses and to determine key genomic features. IRIS-DGE integrates the three most commonly used R-based DGE tools to determine differentially expressed genes (DEGs) and includes numerous methods for performing preliminary analysis on user-provided gene expression information. Additionally, this tool integrates a variety of visualizations, in a highly interactive manner, for improved interpretation of preliminary and DGE analyses. Availability IRIS-DGE is freely available at http://bmbl.sdstate.edu/IRIS/. Contact qin.ma@sdstate.edu Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
The development of high-throughput technologies, such as RNA-seq, have created large amounts of transcriptomic data that require vast computational resources. One common application of RNA-seq data is the determination of differentially expressed genes (DEGs) through a computational process called Differential Gene Expression (DGE) analysis. DGE analyses allow researchers to investigate which genes are significantly differentially expressed across two or more conditions and can provide a meaningful way to attribute differences in gene expression levels to observed phenotypical differences. In implantation of DGE analysis, substantial algorithms have been developed and optimized, such as DESeq (Anders and Huber, 2012), DESeq2 (Love, et al., 2014), edgeR (Robinson, et al., 2010), limma (Ritchie, et al., 2015), Cuffdiff (Trapnell, et al., 2012), Cuffdiff2 (Trapnell, et al., 2013), and sleuth (Pimentel, et al., 2017), among many others. DGE analysis of RNA-seq data typically involves computational experience to not only design the methods and experiment, but also to implement the process using one of the many computing languages. This creates an obstacle for users with limited computational experience who want to analyze the gene expression results from their RNA-seq studies and has led to the increased need to interactivity in DGE analyses and integrated visualization generation of DGE results (Perkel, 2018). While there have been substantial efforts to provide platforms for DGE analysis and visualization of DGE results (Ge, 2017;Goff, et al., 2013;Harshbarger, et al., 2017;McDermaid, et al., 2018;Nelson, et al., 2017;Nueda, et al., 2017;Powell, 2015;Younesy, et al., 2015), numerous pitfalls and bottlenecks persist. Among these pitfalls are the difficulty of experimental design implementation, lack of comprehensive integrated preliminary analyses and DGE tools, and lack of functionalities and interactivity related to visualizing the analysis results.
To address these bottlenecks, we have created IRIS-DGE, which is an Interactive RNAseq data analysis & Interpretation System for Differential Gene Expression analysis. IRIS-DGE provides a user-friendly platform to analyze gene expression profiles comprehensively and to visualize the results from these analyses using interactive figures and tables. Compared with other DGE analysis and visualization platforms, IRIS-DGE provides a more comprehensive user experience on multiple levels. Key areas of emphasis that IRIS-DGE outperforms other tools are the number of integrated DGE tools, variety of visualizations, useful preliminary and DGE analyses, and features aimed at improving the efficiency, applicability, and widespread use ( Figure 1A). Focusing on these four areas, IRIS-DGE outperforms comparable tools, with integration of three of the most commonly used R-based DGE tools, seven differential visualizations, five distinct and useful analyses, interactivity, and both experimental design options and capabilities for interpreting user-provided design matrices. This tool is capable of performing expression quality check and preliminary analyses and integrates three of the most common R-based DGE tools (Supplementary Materials S1), DESeq2, edgeR, and limma, to perform DGE analysis. IRIS-DGE also provides users with a choice of intuitive experimental design options, as well as the ability for users to upload their own design matrix. In addition to the analyses, IRIS-DGE provides numerous interactive visualizations of each analysis, enabling users to gain a more global view of their data and results (Figure 1 B-E). Each of these visualizations is also available for download as a static image plot for inclusion in reports or publications.
IRIS-DGE is freely available for use at http://bmbl.sdstate.edu/IRIS/ and can be loaded through R on a local computer. More information regarding implementation and accessibility can be found in Supplementary Materials S2.

Required Inputs
IRIS-DGE requires two user-provided input files ( Figure 1B and Supplementary Materials S3): (1) a gene expression estimation matrix (EEM, also referred to as sample count data) and (2) a condition matrix with factor levels corresponding to the provided samples in the EEM. Once users have uploaded their required data, IRIS-DGE provides two distinct pathways for analysis.
Users can choose an in-depth approach utilizing all features of IRIS-DGE, including quality control and preliminary analyses before continuing to DGE analysis. While users are suggested to fully utilize all IRIS-DGE features, we also provide a method for data to be analyzed much more quickly through IRIS-DGE. This approach is referred to as the Expedited Analysis approach and only involves data input and final results extraction.

Expedited Analysis
To focus on the user-friendliness of IRIS-DGE, the tool is designed for users to proceed from EEM to final DGE results extraction in fewer than ten clicks and around one minute through the Expedited Analysis approach (Supplementary Materials S4). All inputs, including parameters, specifications, and tools, have a default setting recommended by the authors for best general performance and to assist users with limited computational and analysis experience. Users would only need to submit their data and indicate their preferred comparisons. This method is provided in order to allow users to quickly generate DGE results without having to go through each of the features included in IRIS-DGE and to extract the usable results from the default parameterizations. The expedited analysis pathway takes users from data input ( Figure 1B) directly to DGE results extraction ( Figure 1E).

In-depth Analysis
Users who prefer a more investigative approach of their gene expression data could instead utilize all features of IRIS-DGE thorugh the In-depth Analysis pathway, fully exploring the provided quality control and preliminary analyses before moving on to the specific details of their DGE analysis (Supplementary Materials S5.1).

Quality Control
After upload and submission, the two input files are first analyzed by IRIS-DGE quality control (Supplementary Materials S5.2). This process involves a sample of the input data provided by the user and boxplots and histograms of read count distributions and a histogram of total read counts by the sample. The purpose of the quality control process is to enable exploration of the submitted data and to verify that there are no unexpected or unexplainable abnormalities in the data, such as low total read counts or individual samples displaying strange distribution behavior.

Preliminary Analyses
Another of the important features of IRIS-DGE is the preliminary analyses ( Figure 1C), which can assist users in visualizing their EEM input information and to discover trends in their data that may provide addition hypotheses to investigate with downstream analyses, including DGE analysis. Preliminary analyses performed by IRIS-DGE include sample correlation analysis and pairwise expression scatterplots (Figure 1Ci

Differential Gene Expression Analysis
After submitting and exploring the data, users can move onto the DGE phase of IRIS-DGE (Supplementary Materials S5.4). This analysis is performed using any one of the three provided tools: DESeq2 (Anders and Huber, 2012), edgeR (Robinson, et al., 2010), and limma (Ritchie, et al., 2015). The default tool is DESeq2 based on our RNA-seq experience and independent evidence supporting its performance (Sahraeian, et al., 2017), but users can also select one of these other two tools based on their own preference. While there are other high-performing, commonly-used DGE tools available, their compatibility with IRIS-DGE is what excludes their use in IRIS-DGE. Tools like Cuffdiff (Trapnell, et al., 2012) that are not R-based or tools like sleuth (Pimentel, et al., 2017) which do not operate on sample count data are not included due to compatibility issues.
In addition to the DGE tool, the experimental design can also be specified by the user.
The designs provided in IRIS-DGE include two-group comparisons for analysis of selected pairwise comparisons, multiple factorial comparisons, classic interaction design, additive models for pairing or blocking of data, main effect testing, and blocked main effect testing.
Additionally, IRIS-DGE provides a method for users to specific their own experimental, for the instances when the user needs a design not already included in IRIS-DGE. Each of these methods has unique parameters to specify by the user, typically including which factors are intended for analysis and which specific comparisons are required. After analyzing the data, IRIS-DGE provides an overview displaying the number of up-and down-regulated IDs for each indicated comparison, along with a histogram displaying this information (Figure 1Di). The results table is also available through IRIS-DGE, along with interactive MA (Figure 1Dii) and Volcano plots (Figure 1Diii).
Similar to the figures generated in the Preliminary Analysis section of IRIS-DGE, the plots in the DGE section are also interactive. This feature allows users to gain more specific information from their plots, including highlighting individual or regions of data points on the plot. Doing so will highlight the corresponding row of the DGE results table, showing users which information about genes that may be outliers or fall within a certain region. Conversely, users can select specific gene IDs from the results table, which highlights the location of that gene ID's or set of gene IDs' data points on the corresponding plot. Users who are focused on a specific gene or set of genes can use this feature to easily determine the relative location of their genes in the plot.

IRIS-DGE Outputs
IRIS-DGE provides users with methods for extracting content based on preliminary and DGE analyses. All figures in the QC, Preliminary Analysis, and DGE Analysis sections have the option for users to download as a static image in PDF or PNG format. Additionally, all tables in the DGE Analysis section are downloadable as CSV files, with the final results table being downloaded in its entirety or filtered based on user-provided or default adjusted p-value and log fold-change cutoffs. As part of the biclustering analysis, users can also download a list of gene IDs contained within the specified cluster.

Conclusions
IRIS-DGE is a tool developed for comprehensive DGE analysis designed to address current issues with tools intended to perform DGE analysis and visualize and interpret the results. This tool implements numerous features including EEM quality control, preliminary analysis, and DGE utilizing the most commonly used R-based DGE tools in a user-friendly, comprehensive platform. It is noteworthy that IRIS-DGE provides advanced experimental design options in an intuitive format, while also allowing users to provide their own design matrix to facilitate efficient DGE analysis for a broad spectrum of users. Each analysis section within IRIS-DGE provides relevant information in a highly-interactive visual format. In addition to providing a comprehensive method for performing DGE analysis through the quality control and preliminary analyses, IRIS-DGE also provides a straightforward method for expedited analysis that allows users to extract DGE results in a very limited amount of time using the default DGE parameters.