HemI: A Toolkit for Illustrating Heatmaps

Recent high-throughput techniques have generated a flood of biological data in all aspects. The transformation and visualization of multi-dimensional and numerical gene or protein expression data in a single heatmap can provide a concise but comprehensive presentation of molecular dynamics under different conditions. In this work, we developed an easy-to-use tool named HemI (Heat map Illustrator), which can visualize either gene or protein expression data in heatmaps. Additionally, the heatmaps can be recolored, rescaled or rotated in a customized manner. In addition, HemI provides multiple clustering strategies for analyzing the data. Publication-quality figures can be exported directly. We propose that HemI can be a useful toolkit for conveniently visualizing and manipulating heatmaps. The stand-alone packages of HemI were implemented in Java and can be accessed at http://hemi.biocuckoo.org/down.php.


Introduction
A good picture is worth a thousand words. Recent progress in high-throughput techniques, such as DNA microarray, nextgeneration sequencing (NGS) and quantitative proteomics, has increased the demand for the visualization of multi-dimensional and numeric data [1][2][3].
As an intuitive strategy, a heatmap can graphically visualize the matrix data by representing individual values with different colors. To estimate how many papers have heatmaps, we carefully curated all original research articles published in 2012 of five leading journals, including Nature Biotechnology, Cancer Cell, Genome Research, Genome Biology, and Molecular & Cellular Proteomics, and found that ,30.4% (202 out of 664 papers) contain at least one figure for heatmaps (Table 1). We also manually checked the 202 papers, and observed that the methods for drawing heatmaps were not mentioned in up to ,66% (134/ 202) of them (Table 2). For 68 remaining papers in which the tools were clearly stated, nearly ,46% of them visualized heatmaps with the R language package (Table 2). However, considerable programming skills, which some researchers do not possess, are needed for using R. Also, we found the Java Treeview, an illustrator of microarray data [4], was used in ,24% of the 68 papers. Although the Java Treeview doesn't perform any clustering analyses, a clustered data file in CDT format generated by other tools must be provided an input. To overcome this limitation, Seo et al. developed an interactive tool of Hierarchical Clustering Explorer (HCE) for both visualizing heatmaps and clustering the numeric data [5]. The heatmaps in HCE can be easily manipulated, whereas the artwork quality was yet to be improved. Moreover, although heatmaps can be accomplished by a number of commercial or non-commercial softwares such as GeneSpring GX, Mayday [6], Cytobank [7] and D3 (http://d3js. org/), these tools were not designed specifically for heatmap generating. Recently, an interactive heatmap viewer called jHeatmap was developed [8]. The tool is useful for the intuitive and interactive visualization of complex data in the form of heatmaps. However, no further manipulations, such as recoloration and re-rotation, can be performed. Also, the visualized heatmaps cannot be exported for the publication proposes. Thus, the development of an easy-to-use toolkit for conveniently illustrating heatmaps and exporting publication-quality figures will be a great help for both bioinformaticians and experimentalists.

Method
In this work, we presented a novel software package of HemI (Heatmap Illustrator, version 1.0), which used a red, green, and blue tricolor in a 256 color mode. Given a selected color scale, the total color space will be automatically processed into a numerical matrix (768 rows * 3 columns) by Java. Then the inputted gene or protein expression data can be linearly normalized as below: More frequently, researchers prefer to visualize the logarithmic relations between different conditions and molecular expression levels. Thus, the original data can also be normalized as below: While N NV = normalized value N OV = original value N Max = the maximum of all OVs N Min = the minimum of all OVs N a = 2 (default), and can also be use-defined as 10 or e In both equations, the Max cannot be equal to Min, and both OV and Min values must be greater than 0 in Eq. 2. The calculated NVs were then mapped to the color matrix, while the tricolor values of the nearest number of rows were visualized.
For further analysis of the data in heatmaps, several clustering approaches such as the hierarchical and k-means clustering algorithms, were also integrated. To calculate the distance, three types of linkage criteria (Table 3) and seven kinds of metrics (Table 4) were adopted for the two algorithms, respectively. HemI 1.0 was written in Java 1.6 (J2SE 6.0) and packaged with Install4j 4.0.8. We developed six packages to support three major 686/664 operating systems (OSs), including Windows, Unix/ Linux, and Mac. The stability and applicability of HemI was rigorously tested under Windows XP/7, Ubuntu, and Apple Mac OS X 10.5 (Leopard).

Usage
Heml was developed in an easy-to-use mode. Here, we took data from a previously published study [9] as a demo to describe the usage of HemI. Androgen receptor (AR), a hormone-activated  transcription factor, regulates prostate development, function and malignant transformation as an essential transcriptional repressor [9]. To characterize potentially AR-regulated genes, LNCaP prostate cancer cells were first hormone-starved for 3 days. Then, the gene expression levels were profiled after 3, 6, 12, 24 and 48 hours of androgen treatment [9]. Totally, Zhao et al. identified 428 androgen-repressed genes [9], and the corresponding data set was used as an example for HemI. First, the numerical data in one of the three file formats, including Microsoft Excel spreadsheet (.xls), Tab Separated Value (TSV) or Comma Separated Value (CSV) can be loaded through clicking on the ''LOAD'' button of the main interface. Then, users can select the numerical data area for visualizing a heatmap with mouse-dragging or holding-SHIFT-then-click manipulations. The titles for X-axis and Y-axis can be specified by inputting number of row and column in the data sheet ( Figure 1A). For convenience, an ''Auto Fill'' button was provided, while the first column and row were regarded as the titles of Y-axis and X-axis, respectively. A heatmap will be automatically generated after clicking on the finish button.
The generated heatmap can be easily manipulated in a customized manner. For example, the width and height of the artwork can be adjusted, whereas the blank space out of the heatmap can also be changed ( Figure 1B). The picture can be recolored, re-rotated, and the X-axis and Y-axis can be interchanged ( Figure 1B). Moreover, the corresponding data can be clustered for either or both of X-axis and Y-axis by clicking on the clustering options of main panel ( Figure 1C). After all configurations are

Discussion
The heatmap of potentially AR-regulated genes [9] was reillustrated by HemI (Figure 2A). Also, poly-ADP-ribose polymerase (PARP) family proteins are involved in a variety of cellular pathways such as DNA repair and cell death, and regarded as a class of important drug targets in cancer therapeutics [10]. As a sub-family of PARP, tankyrases also play an essential role in telomere length regulation [10]. Recently, a differential scanning fluorimetry (DSF) approach was adopted for rapid profiling of 185 known and potential PARP chemical compounds for their binding ability to 13 PRAP family proteins including two tankyrases, TNSK1 and TNSK2 [10]. We redrew the heatmap of thermal shifts measured by DSF for all 185 inhibitors against 13 PARP members ( Figure 2B). Our results are consistent with the previous analysis, which demonstrated that most of inhibitors lack specificity and mainly target PARP1-4 as primary hits, while Table 3. Three mostly used linkage criteria for the hierarchical clustering.

Linkage criterion Equation
Average linkage clustering (default) To calculate the distances for the hierarchical and k-means clustering approaches, up to 7 mostly used distances were adopted. doi:10.1371/journal.pone.0111988.t004  The Heatmap Visualization PLOS ONE | www.plosone.org several compounds can efficiently inhibit both PARP1-4 and two tankyrases [10]. Taken together, we propose that HemI 1.0 can be a useful tool for both experimentalists and bioinformaticians, and allow users to draw, manipulate and export publication-quality heatmaps in a user friendly manner. The software packages of HemI will be continuously maintained and improved upon users' comments and feedbacks.