Benchmarking Ontologies: Bigger or Better?

doi:10.1371/journal.pcbi.1001055

Figure 1.

An overview of our proposed approach to benchmarking ontologies.

The test ontology, X, is represented as a set of concepts and set of relations, C_X and R_X respectively, and is compared to domain-specific reference corpus, T. Our analysis begins by mapping concepts and relations of X to T using natural language processing tools (step 1). This mapping allows us to estimate from the text a set of concept- and relation-specific frequency parameters required for computing Breadth and Depth metrics for X with respect to T (step 2). The next step involves estimating the complete ontology for corpus T – an ideal ontology that includes every concept and every relation mentioned in T (step 3). Given the complete ontology, we can estimate the fittest ontology (a subset of the complete ontology) of the same size as the test ontology X (step 4) and compute the loss measures for X (step 5). See Materials and Methods section for precise definitions of the concepts and metrics involved.

More »

Expand

Figure 2.

Overlap of the three largest thesauri and three medical ontologies in our study.

(Inset diagrams represent modified Venn diagrams where each set is depicted in such a way that the number of elements in the set is exactly proportional to size of the corresponding area.) (A–B) Venn diagrams showing intersections between three of the compared medical ontologies: ICD9 CM, SNOMED and CCPSS at the level of concepts (disease and syndrome only) and at the level of relations between these concepts. (C–D) Venn diagrams showing intersections between the three largest thesauri: WordNet, The Synonym Finder (Finder), and Webster's New World Roget's A–Z Thesaurus (Roget's) at the level of headwords and synonym pairs.

More »

Expand

Table 1.

Size of biomedical ontologies and seven thesauri.

More »

Expand

Table 2.

Three corpora.

More »

Expand

Table 3.

Comparison of three medical ontologies in terms of Breadth, Depth and (Depth) Loss, Relative Depth and Relative Depth Loss.

More »

Expand

Figure 3.

Four examples of synonym substitution probabilities in three corpora in our study.

Plots A–D correspond to the headwords futile (adjective), stretch (verb), headache (noun) and cat (noun) respectively. The horizontal position of each synonym represents the substitution probability on a logarithmic scale as does the font size. The color of each synonym indicates the corpus in which the substitution is most probable: black – medicine, red – novels, and blue – news. The frequency of each headword in the three corpora is also listed using the same color codes.

More »

Expand

Figure 4.

Six additional examples of synonym replacement (see Figure 3 legend).

Plots A–F correspond to the headwords driver (noun), insult (noun), beforehand (adverb), verdict (noun), degrade (verb) and nervousness (noun).

More »

Expand

Figure 5.

Nine metrics computed for all seven English thesauri across three corpora.

The size of each dictionary symbol is proportional to the total number of synonymous relations it contains. (A, B, C) Information retrieval metrics Recall, Precision, and F-measure; (D, E, F) concept-frequency metrics Breadth¹, Depth¹, and Depth¹ Loss; and (G, H, I) metrics based on frequency of both concepts and relations—Breadth², Depth², and Depth² Loss.

More »

Expand