VennPainter: A Tool for the Comparison and Identification of Candidate Genes Based on Venn Diagrams

VennPainter is a program for depicting unique and shared sets of genes lists and generating Venn diagrams, by using the Qt C++ framework. The software produces Classic Venn, Edwards’ Venn and Nested Venn diagrams and allows for eight sets in a graph mode and 31 sets in data processing mode only. In comparison, previous programs produce Classic Venn and Edwards’ Venn diagrams and allow for a maximum of six sets. The software incorporates user-friendly features and works in Windows, Linux and Mac OS. Its graphical interface does not require a user to have programing skills. Users can modify diagram content for up to eight datasets because of the Scalable Vector Graphics output. VennPainter can provide output results in vertical, horizontal and matrix formats, which facilitates sharing datasets as required for further identification of candidate genes. Users can obtain gene lists from shared sets by clicking the numbers on the diagram. Thus, VennPainter is an easy-to-use, highly efficient, cross-platform and powerful program that provides a more comprehensive tool for identifying candidate genes and visualizing the relationships among genes or gene families in comparative analysis.


Introduction
In comparative genomics, the visualization of results can help viewers discover correlations and trends in large datasets [1][2][3][4]. Many methods can visualize statistical analysis (e.g., scatter diagrams, line graphs, and histograms) [2,3,[5][6][7][8], biological networks (e.g., pathway and functional networks) [7,[9][10][11] and comparisons of large-scale 'omic' data (e.g., clusters, heatmaps, and circsters) [1,3,4,6,12]. Venn diagrams, first developed by John Venn in 1880 [13], are widely used for comparing multiple genomic, transcriptomic and proteomic datasets due to their ease-of-interpretation and graphical simplicity [14][15][16][17][18][19][20][21]. These diagrams help to identify candidate genes and gene networks for downstream analyses. For example, the simple n-Venn diagram is a collection of n simple intersecting closed curves in the plane. It indicates the relationships among datasets, including intersections, sums, complements [13,22]. The curves divide the plane into 2 n -1 distinct intersections, each defined by its intersection of the interior or exterior of each of the curves [23]. Generally, the Classic Venn diagram deciphers no more than four sets. The development of Symbolic Logic has facilitated several approaches for constructing Venn diagram with more than five sets, including Classic, Edwards', Lewis Carroll's and Nested Venn diagrams [24,25]. Edwards' and Nested methods might generate Venn diagram for an infinite number sets, but the partition of sets among multiple datasets might have complex associations because distinct open regions increase exponentially with the increase in set-number. This makes it difficult to generate intuitive diagrams that display associations among datasets.
Many open access programs can generate Venn diagrams, such as Venny [26], VennDiagram [27], BioVenn [28], GeneVenn [29], 4-way Venn Diagram Generator, DrawVenn, Venn-Master [30], VennPlex [31], VennTure [32] and others. However, these programs have some limitations. For example, DrawVenn requires the manual drawing of diagrams and it cannot process data. VennMaster [30] provides area-proportional Euler diagrams for functional GO analysis of microarrays only. VennPlex [31] compares and visualizes datasets with differentially regulated data points. Powerful VennDiagram [27] generates Venn and Euler diagrams in R and it provides a large number of customizable features. Unfortunately, its command-line operation is not user-friendly. VennTure [32] can generate six-sets Venn diagrams with a graphic user interface (GUI), yet it consumes large amounts of memory and has low computational efficiency. Venny [26], BioVenn [28], GeneVenn [29], and 4-way Venn Diagram Generator are web applications. Despite their power and being user-friendly, none of them can evaluate more than four datasets. The latest program, jVenn [33], can handle six input lists at most but only provides Classic and Edwards' Venn diagrams.
Available programs generate no more than six-set Venn diagrams and only support Classic and Edwards' Venn layouts. Larger datasets often present an insurmountable challenge to deciphering and drawing Venn diagrams of shared relationships manually. This complexity might explain the dearth of applications [34][35][36][37]. To rectify this limitation and address Venn-based demands, we report the development of VennPainter, a program that introduces a new nested Venn layout. Fig 1 illustrates seven-set Edwards' (Fig 1a) and Classic's (Fig 1b) Venn diagrams. The former illustrates that intersections become smaller with increasing numbers of sets, which presents a challenge for interpretation. The irregular curves of the latter approach are equally challenging. In comparison, the nested Venn (Fig 1c) is far more easily interpreted. VennPainter incorporates the nested Venn layout and increases the number of allowable datasets up to eight with diagram output. It also offers text output of up to 31 datasets for downstream analyses. Further, VennPainter elevates computational efficiency.

Implementation and Method
VennPainter and its availability VennPainter (Fig 2) was developed with Qt 4.8.5 under its LGPL v2.1 license. The Qt C++ framework was chosen for its cross-platform capabilities, open-source nature, and secure language construction for communicating between objects (signals and slots) (http://qt-project. org/). For data sets ranging from nine to 31, VennPainter provides vertical, horizontal and matrix text-based formats for the benefit of downstream analyses. The user manual and basic designed the study. The other funders (the last two projects in the above list) had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist. instructions appear in the initial interface of the program, and can be downloaded together with VennPainter at https://github.com/linguoliang/VennPainter/.

Algorithm
VennPainter uses set-theory to generate Venn diagrams. The intersection is defined as follow: and its complement: a x can represent the following: , then x1 and x2 belong to the same intersection. VennPainter labels every intersection U m with an integer c U m in the Venn diagram (S1, S2 and S3 Figs). If a x ¼ c U m , then x 2 U m . The flowchart (Fig 3) shows how VennPainter works.

Input and Output
VennPainter requires that each set be input as a text file. A white space character (space, tab, and newline) must separate every element in the set. After uploading all files, the program stores all elements in a hash table and classifies the elements. The algorithm obtains all statistics from a single read of the hash table. VennPainter can export integrated data as a text file (Fig 5) in Matrix, Vertical and Horizontal text-based formats. In the Matrix format, the first row contains all datasets and the first column contains all elements from the datasets. Other columns contain elements belonging to respective datasets. In the Vertical mode, each row indicates an intersection. For example, a six-set Venn diagram has 64 intersections and, thus, the text file contains 64 rows. Horizontal mode is identical to the vertical mode except for the exchange of columns and rows. Further, VennPainter can export single-shared datasets. Users can obtain a specific shared-dataset by clicking the number on the diagram and the 'export' button. Exported images are in the SVG format (Scalable Vector Graphics) [38,39], which can be read and modified easily by many graphic vector editors, such as Adobe Illustrator, Inkscape and CorelDRAW. The software provides tooltips when the mouse point over buttons or numbers in the diagram.

Example Application
To demonstrate the functions of VennPainter, we use it to depict shared gene sets in the goldfish x common carp hybrid system using eight annotated gene lists generated from RNA-seq data (S4 Fig) [37]. The Nested Venn diagram shows unique and shared relationships of eight sets by inlaying four unique-shared diagrams into the other four sets' unique sharing diagram. The number in the center-most area (27,681) in the black rectangle shows the shared genes by all eight samples (S4 Fig). In a very intuitive manner, Nested Venn shows that each sample had more than 200 unique genes. It efficiently obtains candidate genes and facilitates downstream analyses of GO enrichment and KEGG annotation [37]. We evaluate the following seven primate gene-lists from GFF files (NCBI Genome database; S1 Table) using VennPainter: Homo sapiens, Gorilla gorilla, Macaca mulatta, Nomascus leucogenys, Pongo abelii, Pan paniscus, and Rhinopithecus roxellana. A comparison of our analyses with that of Zhou et al. 2014 [36] is informative. Analyses by the latter authors discovered 38 unique or shared sets, only 14 sets were marked with gene numbers, and 10,244 genes or gene families were shared by the seven primates (Fig 6a). In contrast, VennPainter depicts 127 intersections that a seven-set Venn diagrams should resolve, and these primates share 8,452 annotated genes (Fig 6b). Their Venn diagram did not depict all possible logical relationships among all the sets.

Benchmark Test
To evaluate VennPainter's relative performance, we use benchmarking data ( Table 1)  Vennture crashes after 8.4 Ã 10 5 ms. Thus, VennPainter is more than seven times faster than other tested programs. The increased speed owes to VennPainter bring programmed in C++, while Venny and jVenn were programmed by JavaScript and VennDiagram by R.

Platforms and GUI
Several features make VennPainter more efficient at processing data than other available tools. VennPainter works with Windows, Linux and Mac operating systems (Table 1) and it has a concise GUI that eliminates the need for programming skills. The simple clicking on a number in any diagram promotes downstream analyses. Unlike other programs, VennPainter provides three diagrams including Classic Venn, Edwards' Venn and Nested Venn diagrams for flexibility. Nested Venn is the default depiction when evaluating for more than six sets because regions have a more evenly distribution than Edwards' Venn and are more orderly than classic Venn [34]. This approach makes it easy to fill in and visualize numbers. Nested Venn diagrams are particularly effective when considering more than six datasets, and VennPainter extends the capacity of processing up to eight datasets. So far, only VennPainter can achieve this comparison. Thus, VennPainter can applied to all shared data that need to be extract from dataset(s) for genomic and transcriptomic comparison.   [37]. (TIF) S1 Table. GFF file information. (PDF)