UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization

Background Biomedical ontologies have been growing quickly and proven to be useful in many biomedical applications. Important applications of those data include estimating the functional similarity between ontology terms and between annotated biomedical entities, analyzing enrichment for a set of biomedical entities. Many semantic similarity calculation and enrichment analysis methods have been proposed for such applications. Also, a number of tools implementing the methods have been developed on different platforms. However, these tools have implemented a small number of the semantic similarity calculation and enrichment analysis methods for a certain type of biomedical ontology. Note that the methods can be applied to all types of biomedical ontologies. More importantly, each method can be dominant in different applications; thus, users have more choice with more number of methods implemented in tools. Also, more functions would facilitate their task with ontology. Results In this study, we developed a Cytoscape app, named UFO, which unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for all types of biomedical ontologies in OBO format. Based on the similarity calculation, UFO can calculate the similarity between two sets of entities and weigh imported entity networks as well as generate functional similarity networks. Besides, it can perform enrichment analysis of a set of entities by different methods. Moreover, UFO can visualize structural relationships between ontology terms, annotating relationships between entities and terms, and functional similarity between entities. Finally, we demonstrated the ability of UFO through some case studies on finding the best semantic similarity measures for assessing the similarity between human disease phenotypes, constructing biomedical entity functional similarity networks for predicting disease-associated biomarkers, and performing enrichment analysis on a set of similar phenotypes. Conclusions Taken together, UFO is expected to be a tool where biomedical ontologies can be exploited for various biomedical applications. Availability UFO is distributed as a Cytoscape app, and can be downloaded freely at Cytoscape App (http://apps.cytoscape.org/apps/ufo) for non-commercial use

Biomedical ontologies are represented in a directed acyclic graph (DAG). In this section, we introduce eleven implemented between-term semantic similarity measures (including eight node-based, two edge-based and one hybrid-based) and eleven between-entity measures (including four pairwise-based and seven group-wise-based ones). First, we introduce information content (IC) of a term. Then, we define semantic similarity measures between terms and between entities.
I. INFORMATION CONTENT The IC of a term is calculated based on a corpus, (i.e., an annotation database of HPO (Köhler, et al., 2014)) as following: where p t is the probability of a term occurred in a given corpus (Lord, et al., 2003): where Annot t is number of proteins annotated with term t in a corpus and Children t is the set of children of term t in given ontology graph. Figure 1 illustrates Information Content (IC) calculation for Gene Ontology (GO) term using Gene Ontology and Annotation databases. Figure S1: Illustration of Information Content (IC) Calculation for Gene Ontology term using Gene Ontology and Annotation databases.
In the UFO tool, we visualize IC of terms by color. Figure 2 visualizes ICs of some HPO terms by their color (the higher IC of the term is the redder of node is) Figure S2: ICs of HPO terms are visualized by red color (blue node is root).

II. BETWEEN-TERM MEASURES
The ontology data is represented in directed acyclic graph, in which each term is located in a node and the relations between terms are represented by edge connecting nodes. Therefore, calculation for semantic similarity between terms are categorized into two main methods: i) node-based and ii) edge-based.

A. Node-based measures
In this section, we firstly introduce four node-based measures for similarity between terms. They are all based on most informative common ancestors (MICA) of the terms, which is defined as following: where SP(t 1 , t 2 ) are shared ancestors of terms t 1 and t 2 . Figure 3 shows shared ancestors and MICA term of two HPO terms (HP:0000008 and HP:0000010) First, Resnik (Resnik, 1995) defined the similarity between two terms as following: Second, other between-term similarity measures proposed by (Lin, 1998), (Jiang and Conrath, 1997) and (Schlicker, et al., 2006) were defined respectively as following: Similarly to the four between-term similarity measures, (Couto, et al., 2005) defined common disjunctive ancestors (CDA) of terms t 1 and t 2 to replace MICA in the four measures as following: where contain common disjunctive ancestors of terms t 1 and t 2 Figure 4 shows CDA terms (HP:0000118 and HP:0000119) of two HPO terms (HP:0000008 and HP:0000010) Therefore, four more between-term similarity measures were defined: N<O=9Pa/bE? , S9=a/bE? , V7a/bE? and N<:a/bE? . In summary, a total of eight node-based between-term similarity measures were used in our study.

B. Edge-based measures
In this section, we introduced two widely used edge-based between-term semantic similarity measures. The first one is simply defined as length of longest path from lowest common ancestor (LCA) to root, cd Q , R =( , ) (Wu, et al., 2005). Meanwhile, the second one (Yu, et al., 2005) additionally takes consider on length of longest path between each term to LCA and formally defined as following: where ( Q , R ) length of the longest path between two terms Q and R Figure 5 shows the longest path of two HPO terms (HP:0000008 and HP:0000010).

C. Hybrid-based measures
We implemented one hybrid measure, which was introduced in (Wang, et al., 2007). In the study, an ontology term t is represented as DAG t = (t, T t , E t ) where T t is the set of ontology terms in DAG t , including term t and all of its ancestor terms in the ontology graph, and E t is the set of edges connecting the ontology terms in DAG t . To quantitatively compare two ontology terms, a semantic value of term t is defined as the aggregate contribution of all terms in DAG t to the semantics of term t as following: Finally, a semantic similarity between two terms, t 1 and t 2 , is defined as follow: uvw/9; Q , R = ( # G 9 + ( # I 9 ) # k ∈l > G ∩l > I ( Q ) + ( R ) Figure 6 shows DAGs of two HPO terms (HP:0000008 and HP:0000010). . In this section, we are going to introduce between-entity similarity measures A. Pairwise This approach calculates the similarity between two entities based on the similarity of every pair terms which annotate to the entities.
First, the similarity of every pair of terms is calculated to generate a similarity matrix as following (see Fig. 7): Then, four pairwise between-entity were defined as following: The first two measures were simply either based on average (Lord, et al., 2003) or maximum (Sevilla, et al., 2005) similarity of all pairs. Meanwhile, Couto et al (Couto, et al., 2005) and Azuaje et al (Azuaje, et al., 2005) opted for a composite average in which only the best-matching term pairs are considered (best-match average): Finally, Schlicker et al (Schlicker, et al., 2006) proposed a variation of the best-match average: This approach is categorized into two main methods: i) vector-based and ii) graph-based. In which, two popular vector-based between-entity similarity measures Cosine (Huang, et al., 2007) and Kappa (Chabalier, et al., 2007) were defined based on cosine and kappa coefficients. More specifically, T 1 and T 2 were first represented as binary vectors as following: ℎ where T is whole set of N terms in the ontology database. Then, the cosine of the two vectors v 1 and v 2 is calculated: Another vector-based between-entity similarity measure is based on Kappa coefficient. Formally, it is defined as following: where: -P o is observed proportionate agreement: (P 11 + P 00 )/P t -P e is overall probability of random agreement: P e1 +P e0 -P e1 is probability that both of them=1: P v1=1 *P v2=1 -{ G •Q is probability v 1 =1: (P 11 +P 10 )/P t -{ I •Q is probability v 2 =1: (P 11 +P 01 )/P t -P e0 is probability that both of them = 0: P v1=0 *P v2=0 -{ G •‹ is probability v 1 =0: (P 01 +P 00 )/P t -{ I •‹ is probability v 2 =0: (P 10 +P 00 )/P t -P t , P 11 , P 00 , P 10 and P 01 is total number of observations, number of observations where Q k = R k = 1, number of observations where Q k = R k = 0, number of observations where Q k = 1 and R k = 0, and number of observations where Q k = 0 and R k = 1, respectively.
For graph-based measures, T1 and T2 is extended with ancestors of terms in each set. Therefore, they are defined as follows: Five more graph-based between-entity similarity measures were defined as following: A measure is based on term overlap (Lee, et al., 2004) l-= Q ∩ R A normalized version of l- (Mistry and Pavlidis, 2008) 'l-= Q ∩ R ( Q , R A measure is based on Jaccard index (Gentleman, 2005;Martin, et al., 2004) -@ = Q ∩ R Q ∪ R An IC-based weighted version of -@ (Pesquita, et al., 2007) a@7 = # Ž # Ž ∈ l G ∩l I # • # • ∈ l G ∪l I Finally, a longest path-based between-entity similarity measure (Gentleman, 2005): In summary, a total of eleven between-entity similarity measures were used in our study.
IV. ENRICHMENT ANALYSIS Given an entity set (S e ) and an ontology term t, let H 0 denotes the null hypothesis that there is no significant association between S e and t. The association between S e and t is defined as an overlap (k) between the S e and the set of entities annotated with term t in the corpus. There are three statistical tests popularly used to test whether the overlap is significant or not (i.e., H 0 is rejected or not) (Rivals, et al., 2006) Fisher showed that the probability of obtaining the overlap (with an observed value k) was given by the hypergeometric distribution with parameters N, n, and K (Agresti, 1992;Fisher, 1922): where: -N is number of annotated entities in the corpus (e.g., number of genes in the corpus which are annotated with GO terms) -K is number of entities annotated with term t in the corpus (e.g., number of genes in the corpus which are annotated by a specific GO term). -n is number of entities in the entity set (S e ) (e.g., a gene set of interest which we want to find GO terms significantly annotating to). -k is number of entities in the entity set which are annotated with term t.
For a large sample, the overlap has approximately a binomial distribution where: • p is success probability in the population (i.e., K/N, the probability that an entity is annotated with term t in the corpus). • K is the number of success states in the population (i.e., number of entities annotated with term t in the corpus). • n is the number of draws (i.e., number of entities in the entity set (S e )).
• k is the number of observed successes (i.e., number of entities in the entity set which are annotated with term t).
S e is said to be enriched by t if there is statistically significant overlap between entity set (S e ) (n) and the set of entities annotated with term t in the corpus (K).
When testing multiple hypotheses, the obtained p-values have to be corrected in order to control the type I error (false positive) rate (Noble, 2009). In UFO, we implemented two multiple testing correction methods, i.e., Bonferroni, and Benjamini and Hochberg correction. The former (i.e., Bonferroni) strongly controls the probability of making at least one type I error (i.e., the family-wise error rate (FWER)) for tests (Bonferroni, et al., 1936); meanwhile, the latter is to control the false discovery rate (FDR), i.e. the expected proportion of false positives among the positively identified tests (Benjamini YaY, 2001).
After applying a multiple testing correction method, an adjusted p-value was obtained for each ontology term t. The p-value represents the probability of the null hypothesis; thus, the smaller p-value is the less likely that the association between the entity set (S e ) and that term is random. In enrichment analysis, the p-value £0.05 indicates the association is statistically significant.
V. SIMILARITY BETWEEN TWO ENTITY SETS The procedure to calculate the similarity between two sets of entities is as following: -For each set of entities, a set of ontology terms statistically significant annotating for the entity set is identified. Thus, each entity set is now equivalent to a meta-entity which is annotated with the set of significant ontology terms. -The similarity between two entity sets is now equivalent to the similarity between two meta-entities, that can be calculated by any of between-entity similarity measures (see section III. BETWEEN-ENTITY MEASURES).