Figure 1.
An overview of our proposed approach to benchmarking ontologies.
The test ontology, X, is represented as a set of concepts and set of relations, CX and RX respectively, and is compared to domain-specific reference corpus, T. Our analysis begins by mapping concepts and relations of X to T using natural language processing tools (step 1). This mapping allows us to estimate from the text a set of concept- and relation-specific frequency parameters required for computing Breadth and Depth metrics for X with respect to T (step 2). The next step involves estimating the complete ontology for corpus T – an ideal ontology that includes every concept and every relation mentioned in T (step 3). Given the complete ontology, we can estimate the fittest ontology (a subset of the complete ontology) of the same size as the test ontology X (step 4) and compute the loss measures for X (step 5). See Materials and Methods section for precise definitions of the concepts and metrics involved.
Figure 2.
Overlap of the three largest thesauri and three medical ontologies in our study.
(Inset diagrams represent modified Venn diagrams where each set is depicted in such a way that the number of elements in the set is exactly proportional to size of the corresponding area.) (A–B) Venn diagrams showing intersections between three of the compared medical ontologies: ICD9 CM, SNOMED and CCPSS at the level of concepts (disease and syndrome only) and at the level of relations between these concepts. (C–D) Venn diagrams showing intersections between the three largest thesauri: WordNet, The Synonym Finder (Finder), and Webster's New World Roget's A–Z Thesaurus (Roget's) at the level of headwords and synonym pairs.
Table 1.
Size of biomedical ontologies and seven thesauri.
Table 2.
Three corpora.
Table 3.
Comparison of three medical ontologies in terms of Breadth, Depth and (Depth) Loss, Relative Depth and Relative Depth Loss.
Figure 3.
Four examples of synonym substitution probabilities in three corpora in our study.
Plots A–D correspond to the headwords futile (adjective), stretch (verb), headache (noun) and cat (noun) respectively. The horizontal position of each synonym represents the substitution probability on a logarithmic scale as does the font size. The color of each synonym indicates the corpus in which the substitution is most probable: black – medicine, red – novels, and blue – news. The frequency of each headword in the three corpora is also listed using the same color codes.
Figure 4.
Six additional examples of synonym replacement (see Figure 3 legend).
Plots A–F correspond to the headwords driver (noun), insult (noun), beforehand (adverb), verdict (noun), degrade (verb) and nervousness (noun).
Figure 5.
Nine metrics computed for all seven English thesauri across three corpora.
The size of each dictionary symbol is proportional to the total number of synonymous relations it contains. (A, B, C) Information retrieval metrics Recall, Precision, and F-measure; (D, E, F) concept-frequency metrics Breadth1, Depth1, and Depth1 Loss; and (G, H, I) metrics based on frequency of both concepts and relations—Breadth2, Depth2, and Depth2 Loss.