Mutplot: An easy-to-use online tool for plotting complex mutation data with flexibility

With the development of technology, an enormous amount of sequencing data is being generated rapidly. However, transforming this data into patient care is a critical challenge. There are two difficulties: how to integrate functional information into mutation interpretation and how to make the integration easy to apply. One solution is to visualize amino acid changes with protein structure and function in web app platform. There are multiple existing tools for plotting mutations, but the majority of them requires programming skills that are not common background for clinicians or researchers. Furthermore, the recurrent mutations are the focus and the recurrence cutoff varies. Yet, none of the current software offers customer-defined cutoff. Thus, we developed this user-friendly web-based tool, Mutplot (https://bioinformaticstools.shinyapps.io/lollipop/). Mutplot retrieves up-to-date domain information from the protein resource UniProt (https://www.uniprot.org/), integrates the submitted mutation information and produces lollipop diagrams with annotations and highlighted candidates. It offers flexible output options. For data that follows security standards, the app can also be hosted in web servers inside a firewall or computers without internet with Uniprot database stored on them. Altogether, Mutplot is an excellent tool for visualizing protein mutations, especially for clinicians or researchers without any bioinformatics background.


Introduction
The development of sequencing technology has revolutionized cancer studies. After almost two decades of development, Next-Generation Sequencing (NGS) is fast and affordable. It has made precision medicine a clinical reality. NSG provides comprehensive big data to individualize therapies in clinical settings and expand research information. Though this technological advancement has created more opportunities for treatment and research, it has also created a problem of efficiently synthesizing and summarizing the resulting data because they are so large and detailed. Manually filtering big data increases the chance of errors and organizing it is time-consuming. Big data is also difficult to effectively present. Software circumvents all of these problems. Several tools are available for this purpose. However, most are designed for users with programming backgrounds. This excludes hospital and the majority of institution users who do not have such a training. Mutplot offers functions work in web browser and provides flexibility for easy customization. It was designed specifically for clinicians and  [2], Muts-needle-plot [3], Pfam [4], Plot Protein [5], and track-Viewer [6]. None of them meets all the requirements for non-technical users (details shown in Table 1). All of them, except for MutationMapper, use command-line user interface that requires programming training. Muts-needle-plot, Plot Protein, and TrackViewer require manual domain input. Lollipops is unable to distinguish mutations with similar sample frequencies or clustered mutations. Besides, manually entering the data is prone to human errors, and it does not have mutation highlight function. Pfam output JSON file that is not a publishing format. MutationMapper seems to be the best choice because it uses web-based user interface, but it has its own drawbacks. It only displays the highest recurrent mutations (amino acid alterations) and this would eliminate driver gene mutations with low frequency [7]. In fact, many driver genes occur at very low variant allele frequencies due to inter-tumor genetic heterogeneity. If multiple mutations occur in the same gene, the MutationMapper could easily neglect the lower occurrence mutations that are critical for advancing cancer research and personalized medicine. In addition, if two variants are located too closely in MutationMapper, the information from one of them will be overlapped. Another pitfall of MutationMapper is that the domain name would be automatically truncated in the case of limited space. These shortcomings make MutationMapper less ideal for NGS analysis.

Materials and methods
Mutplot includes a complete workflow for visualizing various protein mutations (Fig 1). After inputting a file (tab-delimited or comma-delimited format) with variants information (the required four columns are named Hugo_Symbol, Sample_ID, Protein_Change, and Mutai-ton_Type, S1 Table), Mutplot automatically connects to the most updated protein information from the UniProt [8] database. A total number of 409 oncogenes and tumor suppressor genes are incorporated using a drop-down menu (S2 Table). Mutplot retrieves the domain information for the selected gene. The highlight options for amino acid frequency threshold are set as 1, 2, 3, 4, 5, 10, 15, 20, 25, 30. Both genes and highlight threshold options can be expanded by simply customizing the source code. The instruction is deposited in GitHub: https://github. com/VivianBailey/Mutplot.  Mutplot: An easy-to-use online tool for plotting complex mutation data with flexibility Using the information, Mutplot generates protein diagrams with their domain information, amino acid position, mutation frequency, amino acid alteration, mutation type and the highlighted mutations. The amino acid positions are scaled to the gene length for accurate proportions. The highlighted mutation has detailed amino acid alteration information. Mutation type and description are color-coded for easy visualization and differentiation. When multiple mutations cluster together, Mutplot is smart enough to figure out how to label the mutation without interfering with other mutations. Mutplot also gives high flexibility in terms of output options. It supports JEPG, PDF, PNG as well as SVG for image download. It can also export the selected data for the diagram from the input data and the corresponding domain information retrieved from the updated Uniprot database.
The source code is available for non-commercial use in GitHub: https://github.com/ VivianBailey/Mutplot and can be easily accessed, revised, or integrated in other pipelines or software. Revising the source code can shift Mutplot from a web app to a personal computer or server inside a firewall. This provides a great option for institutions that follow strict security regulations. In addition, the GitHub has a full documentation of Mutplot, instruction of how to customize the source code, and future releases are also deposited in the GitHub with description.
The web app was developed in R programming language. Packages shiny, ggplot2, plyr, httr, drawProteins and ggrepel are used.

Results and discussion
We showed comparisons between Mutplot and Lollipops using the same example data. Fig 2  shows the same mutation settings in Lollipops and Mutplot. Lollipops was not designed for group patients analysis. Thus, it does not provide quantitative sample frequency information. Therefore, its ability to design target therapies based on recurrent mutations is limited. Mutplot is suitable for both single patient and group patients analyses. Mutplot also displays mutation types besides domain information and amino acid alterations. This provides important clues in regard to possible ways these mutations change protein functions. For example, missense mutation substitutes one amino acid in the protein, while nonsense mutation produces a truncated protein with transformed function or no function. In addition, Mutplot addresses the overlapping annotations issue by moving the labels. See the S1 File for details regarding lollipops and Mutplot comparison.  One improvement of Mutplot is the highlight flexibility through user-defined frequency cutoff. For example, when the frequency cutoff is set as 1, any mutation with a frequency equal to or higher than 1 will be highlighted (Fig 3 top). When the cutoff is set as 5, only mutations with a frequency equal to or higher than 5 will be highlighted (Fig 3 middle). In contrast, the Mutation-Mapper only highlights the variants with the highest frequency (Fig 3 bottom). Besides, MutationMapper only annotates the most frequent variant. Though the other annotations could be displayed along with mouse movement, they stay hidden in the saved figures. In addition, Mutplot solves MutationMapper's overlapping problem. When multiple variants locate at the same position, the MutationMapper lays one label over the others (Fig 4 bottom) which causes information loss. Mutplot adjusts the label positions when their mutations occur at the same location so that all labels can be displayed (Fig 4 top). Another drawback in Mutation-Mapper that is fixed in Mutplot is domain name truncation when the space is limited. For example, TP53 contains 3 main domains: P53_TAD, P53, and P53_tetramer. They are labeled as "P53. . .", "P53" and "P53_tetr. . ." by MutationMapper (Fig 3 bottom and Fig 4 bottom), whereas Mutplot marks different domains by colors and lists their information in legends, which avoids the truncation.

Conclusions
Big data is changing the scientific landscape dramatically. It brings significant cost advantages and faster and better approaches for decision-making. With the development of sequencing technology, we are getting such a huge amount of genome information but we don't have the matching analysis power. More and more software and packages are available, but the majority of them are run by one or more programming languages. Scientists and physicians, who eventually need to draw conclusions or make decisions, have to rely on other bioinformatics. This is time-consuming for these decision makers, especially in precise medicine. Thus, easy-tohandle big data tools are in serious need.
Here, we present Mutplot, a web-based visualization tool for protein mutations. Mutplot retrieves protein data from the database automatically and builds diagrams displaying protein variants location, frequency etc. No programming skills are required. What's more, Mutplot highlights the highly recurrent variants according to customer-defined cutoff. This function is especially useful when picking variants out of hundreds or even thousands of candidates in large cohort. In addition, Mutplot provides multiple publication-quality figure formats, such as PDF, JEPG, PNG, and SVG. Other outputs options including source data, protein domain information, are provided as well. For data under protection policy, Mutplot is also compatible with Linux web servers inside of firewall or computers without internet access. Source codes can be easily revised following the instructions on the program website at GitHub. This software simplifies data-processing, especially for medical researchers working with NGS.