Bayesian-knowledge driven ontologies: A framework for fusion of semantic knowledge under uncertainty and incompleteness

The modeling of uncertain information is an open problem in ontology research and is a theoretical obstacle to creating a truly semantic web. Currently, ontologies often do not model uncertainty, so stochastic subject matter must either be normalized or rejected entirely. Because uncertainty is omnipresent in the real world, knowledge engineers are often faced with the dilemma of performing prohibitively labor-intensive research or running the risk of rejecting correct information and accepting incorrect information. It would be preferable if ontologies could explicitly model real-world uncertainty and incorporate it into reasoning. We present an ontology framework which is based on a seamless synthesis of description logic and probabilistic semantics. This synthesis is powered by a link between ontology assertions and random variables that allows for automated construction of a probability distribution suitable for inferencing. Furthermore, our approach defines how to represent stochastic, uncertain, or incomplete subject matter. Additionally, this paper describes how to fuse multiple conflicting ontologies into a single knowledge base that can be reasoned with using the methods of both description logic and probabilistic inferencing. This is accomplished by using probabilistic semantics to resolve conflicts between assertions, eliminating the need to delete potentially valid knowledge and perform consistency checks. In our framework, emergent inferences can be made from a fused ontology that were not present in any of the individual ontologies, producing novel insights in a given domain.


Introduction
Ontologies, the foundation of the semantic web, are widely used in machine knowledge representation.They are used to define classes and the relationships between their members within a domain.Reasoning algorithms reveal implicit knowledge in the model according to the rules of description logic (DL) [1] which is a decidable subset of predicate calculus.Unfortunately, DL does not conveniently represent uncertainty, the existence of multiple conflicting possible states of a domain.There are several approaches to introducing strong uncertainty semantics into DL.Two prominent approaches which have enjoyed some success are fuzzy logic and possibility theory.These have been applied in frameworks such as Fuzzy OWL [2] and possibilistic description logic [3].However, in both theories, some interactions between variables are lost during inferencing.The lost information may be unnecessary for modeling the notions of fuzzy set membership and possibility, but are unable to capture a more complex notion of uncertainty which supports chains of "if-then" interactions between variables.One uncertainty theory which has strong semantics and fully captures these variable interactions is probability theory.Unfortunately, to the best of our knowledge, all the representation frameworks for ontologies which are rooted in probability theory exhibit lossy reasoning or have counterintuitive restrictions on their flexibility.The probabilistic DLs based on Nilsson's probabilistic logic [4] experience decay in relative precision during reasoning due to their expression of probabilities as intervals.Approaches using Bayesian Networks (BNs) [5], such as BayesOWL [6], MEBN/ PR-OWL [7], and P-CLASSIC [8], contain a representation granularity mismatch: Bayesian Networks require complete specification of the domain's probability distribution with no incompleteness, but ontologies have a finer granularity which allows for incompleteness.Some domains with incompletely defined relationships can only be represented in Bayesian Network based frameworks by over defining them.We address all these issues in more detail in Section 2.
There exists another probabilistic knowledge representation framework that can be unified with description logic.Bayesian Knowledge Bases [9,10], or BKBs, are designed to handle incompleteness, and they do not experience reasoning decay like other uncertainty logics.BKBs represent domain knowledge as sets of "if-then" conditional probability rules between propositional variable instantiations.They use those conditional probabilities to compute marginal probabilities of the domain's instantiations, or states.BKBs represent knowledge with the same granularity as ontologies, but they are not an immediate substitute for them because they only reason about propositional knowledge, not predicated knowledge like ontologies do.A synthesis of BKBs and DL which preserves the capabilities of both is desirable.This paper presents an approach for representing uncertainty in ontologies with probability semantics as well as the ability to naturally fuse multiple dissonant probabilistic ontologies which otherwise could not be formally reconciled.
This paper presents two broad contributions.First, we extend a preliminary formulation of the knowledge representation and reasoning framework called Bayesian Knowledge-driven Ontologies (BKOs) [11].BKOs unite the predicate reasoning capabilities of DL with the probabilistic reasoning capabilities of BKBs.They represent knowledge as predicate logic assertions like DL, but also represent conditional probability rules between those assertions like BKBs.We will show that a BKO can reason about both types of knowledge without disempowering either, based on four points: • Uncertainty is defined as the presence of multiple possible states of the world where we have insufficient knowledge to determine which state is true, but such that we can define a probability distribution over the possible states.
• For any set of mutually disjoint classes in an ontology, any individual can be a member of at most one of those classes.Therefore, potential class assignments between the individual and the classes can be represented as assignments of a discrete random variable.
• Generalizing the rule of universal instantiation to its probabilistic analog allows uncertainty to be propagated from terminological axioms to the assertional axioms they imply.
• A BKO where all implicit knowledge has been made explicit maps to an equivalent BKB.
Second, this paper demonstrates that BKO theory allows for reasoning over multiple fused ontologies, including dissonant ones, without modifying them.This is an improvement over current methods of resolving conflicts in merged ontologies, which resort to modifying them up to the point of rejecting knowledge completely (see [12] for an example).Recent work [13] has pushed this envelope, introducing computational methods for minimizing the number of assertions deleted.We make the distinction between "merged" and "fused" ontologies.While both refer to combining multiple ontologies into one larger one, we describe "merged" ontologies as ones that require some manual or automated altering of information and "fused" ontologies as ones that do not require any alterations.Methods for ontology merging compromise a source's potentially valid perspective and miss opportunities for fusion-derived insights.Our methods of fusing ontologies without altering them means BKO theory can take advantage of every potential insight it is provided with.Provided they are lexically aligned, independent machine reasoning can be performed on dissonant ontologies from diverse sources.Even the requirement for lexical alignment is soft-where the source ontologies are not lexically aligned, including one or more alignment ontologies as inputs to the fusion algorithm is sufficient to ensure a valid result.This could be done with manually curated bridge ontologies or by applying recent work on automated ontology alignment [14,15].In Section 7 we fuse two biological ontologies involving the sciatic nerve, the largest nerve in the body that has gained much attention in biomedical research.This example highlights some of the strengths of BKO fusion, specifically the ability to reason despite contradictions and how emergent information can be generated only through fusion.
Our paper is organized as follows: We begin in Section 2 with a brief survey of representative prior approaches to augmenting DL with uncertainty semantics.Next, Sections 3 and 4 provide background on DL and BKB theory.Sections 4 and 5 define BKOs' method of knowledge representation and reasoning.Section 6 defines the method of aligning and merging ontologies from different, potentially conflicting, sources.Section 7 walks through a detailed example of BKO reasoning over two fused biomedical ontologies.Finally, in Section 8, we provide our concluding remarks and a look at future directions and potential applications.

Related work
We now examine the two major classes of uncertainty semantics and their application into ontologies.

Fuzzy logic and possibility theory
Straccia [16] introduces fuzzy logic to semantic networks, while recent work can be found in Jain et al [17].Fuzzy logic is an uncertainty theory designed to represent the notion of ambiguity using partial set membership.Fuzzy logic's axioms are identical to probability theory, except that fuzzy logic lacks the axiom that the union of all events sums to one.The absence of that axiom means that fuzzy logic's reasoning is a coarser treatment of information interaction, using min and max functions in place of the arithmetic functions that probability theory would use.Consider the following example: (Notation: for an individual or class a, a class C, and p 2 [0, 1], a 2 C : p states that a has membership in C with degree p.) Given the assertions a 2 C : 0.7, a 2 D : 0.4, C 2 E : 0.2, and D 2 E : 0.6, what is the membership of a in E? In simple fuzzy set theory, this is max(min(0.7,0.2), min(0.4,0.6)) = 0.4.Note that changes in the degree of different assertions may not affect the final result.A change in the degree of membership of D 2 E would only alter the result if it dropped below 0.4, and a change in the degree of membership of a in C would not alter the result at all.This can be counterintuitive when we consider modeling any notion of causality, since we typically think that a change in a root variable should affect the result.Fuzzy logic is therefore more suited to its intended purpose of comparing entity descriptions than it is to capturing variable interactions.
Possibility theory is introduced to ontologies in [3].Possibility theory models the notion of uncertainty of events, but like fuzzy logic it does not fully capture causal interactions.Possibility theory models the uncertainty of a single event with two numbers from the range [0, 1]: the event's possibility, which is the degree to which the event could be expected to happen, and the event's necessity, which is the degree to which the event must happen.These numbers are related in that the necessity of an event is equal to one minus the possibility of the event's complement.Despite possibility theory's sophisticated uncertainty representation capability, its reasoning mechanism still does not intuitively capture causality.Consider the following example and note the parallels to the example we used for fuzzy logic: (Notation: for events A and C, and p, q 2 [0, 1] where p > q, C|A : (p, q) states that the possibility of C given A is p and the necessity of C given A is q.)Given the assertions C|A:(0.7,0.5), D|A : (0.4, 0.3), E|C : (0.2, 0.1), and E|D : (0.6, 0.55), what is the possibility and necessity of E given A? The answer is simply that the possibility is max(min(0.7,0.2), min(0.4,0.6)) = 0.4 and the necessity is max(min(0.5,0.1), min(0.3,0.55)) = 0.3.As we discussed for fuzzy logic, this is a coarse treatment of causality.

Probability theory
We assume that the reader is familiar with the formulation and reasoning mechanics of probability theory, such as the notions of sample spaces, probability distributions, and conditional probabilities.Compare BKO theory to four groups of frameworks with similar reasoning goals: those founded in Nilsson's probabilistic logic [4], Bayesian Networks [5], probabilistic Horn abduction [18], and lifted probabilistic inference [19].
Regarding Nilsson's probabilistic logic-based frameworks, such as Lukasiewicz [20] (and more recently [21]), Halpern [22], and descendant works such as SHIQp [23], Prob-ALC [24], and Prob-EL [25], we see the difficulty they encounter in the following example: Recall that assertions in probabilistic DL are made probabilistic not by assigning them a probability, but by declaring an interval in which that probability is said to be found.This interval-based definition causes erosion of relative precision with every calculation.Suppose we have two probabilistic axioms, "Tweety is-a Bird" with probability between 0.70 and 0.80 (relative precision 0.13), and "Birds can Fly" with probability between 0.90 and 0.99 (relative precision 0.10).We wish to find the marginal probability that "Tweety can Fly".Since the probabilities are only known as intervals, we must multiply their bounds to get the extreme cases of the marginal probability.The lowest possible probability is 0.9 × 0.7 = 0.63 and the highest possible probability is 0.8 × 0.99 = 0.79, so the marginal probability on "Tweety can Fly" is within the interval [0.63, 0.79].Notice that this interval has a relative precision of 0.23, wider than either of the relative precisions on the original axioms.The representation of probabilities as intervals is an artifact of probabilistic DL's foundation in Nilsson's probabilistic logic [4], which is subject to the same decay in precision.
Regarding BN-based approaches, such as PR-OWL [26], BEL [27], Prob-Ont [28], Baye-sOWL [6], ByNowLife [29], and P-CLASSIC [8], consider the notion of incompleteness in a domain.Incompleteness is when the domain's probability distribution could match one of a number of possible probability mass functions.Recall that BNs assume completeness by assuming that all variables whose joint distributions are not completely known are independent.Ontologies do not share this completeness assumption, so there are incomplete domains which can be represented with conventional ontologies but cannot be expressed with BNbased frameworks unless unsupported and potentially inaccurate constraints are included.Furthermore, we find notions which can be represented in semantic networks that are counterintuitive when we try to express them in BNs even with complete information.For example, if we wanted to describe the probability distribution between the variable "airplane model" and a discretized "gas mileage" variable, it would not make sense to define probabilities for the gas mileage of an engineless glider model.Even the notion of context-specific independence [30] does not avoid this problem because it would still require the "gas mileage" variable to have some distribution given a "glider model" value, but any distribution, even independence, is counterintuitive.Disregarding uncertainty, a semantic network would have no trouble expressing this domain's concepts, because it could simply omit the glider's gas mileage property from any consideration.Some approaches, such as PR-OWL, resolve this by defining a third truth value of "absurd", but permitting incompleteness averts the need to contend with trinary logic.
Probabilistic Horn abduction [18] is a powerful and expressive knowledge modeling and reasoning framework with many conceptual and mathematical similarities to BKO theory, but it is prevented from discovering unanticipated explanations of the world by its demand that all hypotheses be independent and explicitly defined.In BKO theory those are unnecessary constraints, and relaxing them permits combination of knowledge through fusion as we shall detail in Section 6.
Lifted probabilistic inference [19] warrants special mention because it employs a similar assertion structure to that of BKOs, namely the assertion of conditional rules containing simple first-order terms taking individuals as arguments.However, the meanings of these terms and relationships are implicit and subject to interpretation, rather than explicit and richly expressed as in DL.So they do not allow for the autonomous reasoning capability of DL-driven knowledge models.Additionally, lifted probabilistic inference uses BNs to express uncertainty, and so runs afoul of their completeness requirement.Finally, lifted probabilistic inference does not require that the conditions of contradictory rules be mutually exclusive.Knowledge of which rule overrides another is kept implicit, and reasoning requires additional specifications to resolve.BKOs resolve these occurrences explicitly within the knowledge base through fusion.
Three additional approaches also merit mention for their use of structures similar to the conditional probability rules employed by BKBs.Do-calculus [31] arrived at a system which closely resembles conditional probability rules, though its formulation relies on very different intuitions than that of BKO theory.Do-calculus does not address the problem of modeling terminological knowledge, but it does formalize the fusion of conditional probability rules gathered under different regimes of population makeup and sampling bias.This is a matter which BKO theory delegates to the user, rolled up within the task of choosing source reliabilities.Our future work will seek to elaborate on our method of fusion to incorporate do-calculus's insights and potentially subsume it.More recently, BLOG [32] also arrived at a knowledge representation system of conditional probability rules between logical assertions similar to that used by BKOs.However, BLOG does not aim to address the fusion of multiple probabilistic ontologies.We believe that BKOs subsume BLOG and that our fusion approach is directly applicable to multiple BLOGs which we intend to also explore in future work.Similarly, work by Jung and Lutz [33] is based on a definition of a probability distribution over possible states of the world akin to ours, but only defines assertional probabilistic rules, not terminological ones, and does not address fusion.

Background
Here we present necessary background information for the remainder of the paper.We first discuss DL with a focus on the ideas of consistency, assertional knowledge, and terminological knowledge.This is followed by a brief introduction to BKBs that includes their essential definitions and theorems.Finally we discuss BKB fusion, which as we will see has close ties to BKO fusion.

Description logic
We will briefly introduce a simple DL with definitions and notation based on set theory.These definitions are conceptually equivalent to formal DL as presented by Baader et al. [1], but are more closely related to set theory to simplify our derivations in the following sections.We ignore the possibility of mapping ontologies to multiple interpretations, and instead just consider classes and individuals as sets under a single interpretation.Multiple interpretations could be emulated using explicit namespace prefixes on concepts, individuals, etc.
The fundamental concept of description logic is the class, or concept, which is a set.An individual is an element of a class.A role is a binary operator acting from one individual (the owner) to another individual (the filler).Classes, individuals, and roles generally have real world interpretations, such as categories, objects, and relationships between objects.
While the words "class" and "concept" are for the most part interchangeable in DL, "class" generally refers to a more set-theoretic notion of classes/concepts as groups of individuals, while "concept" is used in the context of the descriptive nature of classes/concepts, i.e., that they characterize the nature of the individuals in them.We will mostly use "class" to emphasize the set-theoretic foundation of our theory.
Atomic classes are irreducible.They may be used in expressions called constructors to define new classes, called constructed classes.The expressiveness of constructors is specific to the DL being used.Simple construction operators are: complement, union, intersection, role existential quantification, and role value restriction.Additional operators are defined in more expressive DLs.In general, the more expressive a DL is, the longer its reasoning takes and the greater the risk of it being able to express undecidable problems.Ensuring decidability while achieving maximum expressivity is a hard problem in DL research.
Description logic makes the open world assumption: that the absence of a particular statement within a description of a domain does not imply that statement's falsehood.This implies that every description is incomplete because we can always add new individuals, classes, and rules to it.Here lies an important and subtle distinction: the open world assumption does not imply that every domain is necessarily infinite, but does imply that every domain is possibly infinite, i.e. cannot be proven finite.For practical purposes we will assume than any description of a domain is finite, but we admit the possibility that the domain which it describes is infinite.
Notation.Denote the universal class, the class that contains all individuals, as > (down tack character, not the letter).Because > contains all individuals, it also contains all nonempty classes.? is the empty class, or the class that contains no individuals.
Notation.The complement of class C is written as ¬C, where ¬C = > − C

Asserting knowledge
In DL, knowledge is expressed through assertional axioms and terminological axioms.Assertional axioms are propositional: they characterize a single individual's membership in classes.Terminological axioms are predicated: they define general rules applying to all individuals in a class.The set of assertional and terminological axioms in an ontology are often referred to as the A-box and the T-box, respectively.Definition 3.2.1.An assertional axiom can be either a class assertion or a role assertion: • A class assertion declares that a 2 C for a class expression C and an individual a. DL commonly uses the notation C(a).In some ontology languages, such as the variants of OWL, knowledge can be presented and used in the form of property characteristics [34], which define specific inference rules for instantiations of properties such as functionality, transitivity, and symmetry.This expressive capability is often useful, but somewhat ad-hoc.In this paper we only consider formal, decidable DLs, and therefore only use property characteristics that can be directly expressed in them.
The notion of consistency between assertions is an important one in DL.While typically used for error-checking after reasoning, we will rely on it heavily in defining probabilistic relationships.

Reasoning
Terminological axioms are expressed as predicated statements, can be used to form new assertional axioms.These statements describe relationships between classes, so once we know that an individual is a member of a class, we can infer its relationship to other classes based on the ontology's terminological axioms.

Bayesian knowledge bases
Bayesian Knowledge Bases [9,10] are a generalization of Bayesian Networks.As opposed to BNs, BKBs specify dependence at the instantiation level instead of the random variable level.BKBs allow for cycles between variables, and do not require the complete probability distribution to be specified.BKBs model probabilistic knowledge in an intuitive "if-then" rule structure which quantifies dependencies between states of random variables.Reasoning with BKBs is performed as belief updating, belief revision, or partial relief revision.Belief updating computes the posterior probability of a target variable state, belief revision computes the posterior probabilities of domain instantiations, and partial belief revision computes the posterior probabilities of sets of target variable states.BKBs excel at modeling causal and correlative information because they provide backtraceable explanations of simulation outcomes [35].They see use on problems such as war gaming [36], predicting outcomes of strategic actions [37], insider threat detection [38], and Bayesian structure learning [39].Most importantly, unlike BNs, multiple BKB fragments can be combined into a single valid BKB using the BKB fusion algorithm [40].The idea behind this algorithm is to take the union of all input fragments by incorporating source nodes, which indicate the source and reliability of the fragments.BKB fusion preserves all knowledge and allows for source and contribution analysis to determine the impact of source knowledge on reasoning results.There are two equivalent formulations of BKB theory.One, presented in Santos et al. ( 2003) [10], defines a BKB as a set of conditional probability rules (CPRs) and the other, presented in Santos et al. (1999) [9], defines a BKB as a directed graph.In this section, we present a condensed version of the CPR-based formulation.The notation is slightly modified but expresses equivalent concepts.
Definition 3.4.1.Let {A 1 , . .., A n } be a collection of finite discrete random variables (rvs) where r(A i ) denotes the set of possible values for A i .A conditional probability rule is a statement of the form A CPR R's antecedent, denoted ant(R), is the conjunction of rv assignments to the right of the vertical bar.R's consequent, denoted con(R), is the rv assignment to the left of the vertical bar.R states that given the antecedent, the consequent is true with probability p.Each rv assignment in the antecedent is called an immediate ancestor of the consequent, and the consequent is called an immediate descendant of the rv assignments in the antecedent.Note that an empty antecedent reflects a prior probability.
Definition 3.4.2.Given two CPRs: we say that R 1 and R 2 are mutually exclusive if there exists some 1 � k < n and 1 � l < m such that i k = j l and a i k 6 ¼ a 0 j l .Otherwise, we say they are compatible.Intuitively, the antecedents of mutually exclusive CPRs cannot be simultaneously satisfiable because they are conditioned on different values of the same rv(s).
Definition 3.4.3.R 1 and R 2 are consequent bound of (1) for all k < n and l < m, a i k ¼ a 0 j l whenever i k = j l and ( 2) Intuitively, consequent bound CPRs only conflict in their consequent.Their antecedents are compatible, but their consequents assign different values to the same rv.We use mutual exclusivity and consequent boundedness to define a BKB below: Definition 3.4.4.A Bayesian Knowledge Base B is a finite set of CPRs such that.
• for any distinct R 1 and R 2 in B, either (1) R 1 is mutually exclusive with R 2 or (2) con(R 1 ) 6 ¼ con(R 2 ); and • for any subset S of mutually consequent bound CPRs of B, ∑ R2S P(R) � 1 The following definitions establish the concept of inferences, which are the basis of a BKB's expression of probability distributions.Definition 3.4.5.For a CPR A subset S of BKB B is said to be a deductive set if for each R 2 S the following two conditions hold: • There does not exist some R 0 2 S where R 0 6 ¼ R and con(R 0 ) = con(R).
The first condition establishes that each rv in R's antecedent must be supported by the consequents of other CPRs.The second condition requires that each rv assignment be supported by a unique set of ancestors.Definition 3.4.6.A deductive set I is said to be an inference over B if I consists of mutually compatible CPRs and no rv assignment is an ancestor of itself in I.The set of rv assignments induced by I is denoted V(I).The probability of I is defined as P(I) = ∏ R2I P(R) Definition 3.4.7.Two inferences are compatible if all their CPRs are mutually compatible.
The following theorems establish that inferences can define a partial joint probability distribution.Proofs can be found in [10] Theorem 3.4.1.For each set of rv assignments V, there exists at most one inference I over B such that V = V(I).We used the conditional probability rule formulation of BKBs throughout this paper.However, the directed graph model allows for intuitive visual representations of BKBs.These graphs are comprised of two types of node: instantiation nodes, I-nodes, and support nodes, Snodes.I-nodes represent random variable instantiations and S-nodes represent the conditional dependencies between them.A weighting function assigns a probability to the CPR represented by each S-node.For example, a graphical representation of the CPR: In this example, the black node is an S-node and white nodes I-nodes.
Many CPRs are combined to form a larger BKB.An example BKB is shown in Fig 2.

Bayesian knowledge fusion
One might want to combine the knowledge represented in BKBs from two or more distinct sources.A BKB fusion algorithm [40] is used to do so.We will summarize BKB fusion in the remainder of this subsection.Consider the two knowledge fragments: These fragments could be naively combined by taking the union of F 1 and F 2 to form F 3 : The CPRs P(A = a|B = b) and P(A = a|C = c) have equal consequents, but their antecedents are not mutually exclusive.So this union would violate the mutual exclusivity requirement of BKBs, and the result F 3 is not a valid BKB.This naïve fusion is displayed graphically in Fig 3. To address this issue, source information is included in the fused BKB as additional CPRs.This source information represents the reliability of each source BKB.The source reliability is often determined by those building the BKB, although it is possible for source reliability to be updated as new evidence is considered.In this example, we will give F 1 and F 2 equal reliability scores of 0.5 each.The fused F 3 with source information is as follows: By incorporating source information, the fused F 3 is a valid BKB.By including source node S A in the antecedent of P(A = a|B = b ^SA = 1) = 0.8 and P(A = a|C = c ^SA = 2) = 0.35, the two CPRs from different sources are guaranteed to be mutually exclusive.This is graphically represented in Fig 4.
The algorithm to fuse a set of BKB fragments is found in [40].From [40], we adopt the following theorem: Theorem 3.5.1.The output K 0 = (G 0 , w 0 ) of Bayesian Knowledge Fusion is a valid BKB Perhaps the most useful feature of the fusion algorithm is its ability to discover new inferences which are present in the fused BKB, but not in the input BKBs.Consider the example in Fig 5 .We have rvs for symptoms A, B, and C that can either be present in a patient or not.We have another rv representing the disease a patient might have.Assume a patient has symptom A. From the given fragments we can only conclude that the patient had disease d 1 .Note that we cannot conclude that the patient has disease d 2 because it is not included in fragment 1.Fragment 2 does include d 2 but does not include symptom A. However, when the fragments are fused, we find that disease d 2 is most probable.In such ways, fusion can facilitate the discovery of new insights previously unknown to its sources.

Bayesian knowledge-driven ontologies: Principles and structure
An instantiation of a domain is an assignment of each known individual to known classes.An individual may be assigned to one or more than one class, and a class may be assigned any number of individuals.A BKO models a probability distribution over all of a domain's possible instantiations and uses if-then rules to restrict and reason about that distribution's probability mass function (pmf).This gives a user a formal way to reason in detail about relative likelihoods of the domain's possible states.BKO theory supports incompleteness, so it does not require a complete definition of the pmf.Therefore, a valid BKO may be compatible with more than one pmf.This allows the user to draw valid conclusions from knowledge that would be insufficient for other probabilistic reasoning methods.Furthermore, thanks to its grounding in BKB theory, reasoning can be performed whether the BKO is consistent or not.Checking for probabilistic consistency has historically been a challenge among uncertain semantic network formalisms, but is not a requirement for BKOs.
To formulate this theory, we first define the nature of this probability distribution in terms of its sample space and random variables.We then define the means of expressing knowledge in BKO theory, which is done by declaring probabilistic if-then relationships between variables.Finally, we define the structure of a BKO as a knowledge base, and its mapping to its close cousin the BKB, leading into Section 5 on the reasoning that can be performed with a BKO.

Model of a domain
Recall our first point from the introduction: uncertainty is the presence of multiple possible states of the world, such that we have insufficient knowledge to determine which state is true but can still define a probability distribution over its possible states.This is commonly referred to as "distribution semantics".The following definitions describe our implementation of distribution semantics for BKO theory.

.2. A set of assertions
Note that in practice, one will never generate a full instantiation of a domain, but it is a fundamental concept of the theory.

Asserting knowledge
In BKO theory, knowledge is asserted by declaring if-then conditional probability rules between variables.There are two types of rules used, probabilistic assertional axioms and probabilistic terminological axioms.Probabilistic assertional axioms are propositional, they characterize a single individual's conditional probability of membership in a class.Probabilistic terminological axioms are predicated, or first-order.They implicitly define conditional probabilities of class membership for unspecified individuals.In Section 5 we define how these implicit probabilities can used to create probabilistic assertional axioms.
Proposition 4.2.1.Let C = {C 1 , . .., C n } be a set of classes that partition D and a be an individual.Then there exists a random variable V such that r(V) = {a 2 C 1 , . .., a 2 C n }.Proposition 4.2.1 is crucial for the remaining sections.Later we discuss how to instantiate terminological knowledge.The insight that a random variable is induced for an individual that is a member of a set of disjoint classes allows us to do so.Definition 4.2.2.A probabilistic assertional axiom (PAA) is a conditional probability rule of the form: A PAA R's antecedent, denoted antðRÞ, is the conjunction of random variables to the right of the vertical bar.R's consequent, denoted conðRÞ, is the random variable assignment to the left of the vertical bar.In this case antðRÞ As with PAAs, a PTA's antecedent and consequent are the terms to the right and left of the vertical bar.Note that not all members of a PTA's antecedent must be variable assertions.There must be at least one due to the requirement that the individual in its consequent must be defined in the antecedent.
PTAs are a first-order generalization of the strictly propositional PAA.They facilitate forming complex universal quantification statements, which lets BKO theory express advanced DL notions like property attributes.In fact, BKO theory can be used to express complex custom property attributes not available in DL.A more intuitive explanation is best communicated through some examples.Start with the simplest form of a PTA: This expresses that any member of C 1 has a probability p of also being a member of C 2 .PTAs are also a mechanism for expressing complex probabilistic rules extending some of the features of more advanced forms of DL.In the following example, let R be a specific relational property.
T : Pðx 2 Rðx; ŷÞjŷ 2 Rðŷ; xÞÞ ¼ p This PTA can be read as "The probability that any x is related to any y by R given any y is related to any x by R".Should p = 1, T 1 would declare R to be a symmetric property.Similarly, the PTA T : Pðx 2 Rðx; ẑÞjx 2 Rðx; ŷÞ ^Rðŷ; ẑÞÞ ¼ p would declare T to be a transitive property, should p = 1.Note that p does not necessarily need to be equal to one.Consider the following PTA: With p = 1, T becomes a reflexive property.But if we set p = 0.7, T states that for any individual x in class C, there is a 0.7 chance that it is related to itself by property R. It should be apparent that we can go beyond the offerings of DL to create much more sophisticated terminological expressions.
A PTA must eventually be instantiated, a process that assigns each one of a PTA's variable individuals to a specific individual, resulting in a PAA.Definition 4.2.8.Let X = {x 1 , x 2 , . .., x w } be a set of specific individuals and X ¼ fx 1 ; x2 ; :::; xw g be a set of variable individuals whose range is X.An instantiation function g : X !X is a one-to-one mapping of each variable individual to a specific individual.
Note that the instantiation function defined here is the probabilistic counterpart of the interpretation function in classic DL.
Notation.For some expression E and an instantiation function g, E instantiated by g may be written as either g(E) or E| g .So the concept constructor Ĉ ¼ f ðLðQÞ; fx 1 ; :::; xw gÞ evaluated by g could be written Ĉj g ¼ f ðLðQÞ; fx T| g may be read "T evaluated by g."For a simple PAA like T : Pðx 2 C 2 jx 2 C 1 Þ ¼ p and instantiation function gðxÞ ¼ a, T| g can be read "T evaluated with x equal to a".Note that the probability value assigned to the instantiated PTA is the same as it was before being instantiated.This is what is meant when we say that PTAs describe a pmf.Unlike PAAs, PTAs on their own are not conditional probability rules.PTAs themselves do not have an effect on the pmf but any PAA that is an instantiation of them does.
PTAs and PAAs are flexible enough to represent classical axioms.For example a classical assertional axiom Z is equivalent to the unconditional PAA P(Z) = 1.A subsumption axiom C � D is equivalent to the PTA Pðx 2 Djx 2 CÞ ¼ 1, and a disjointness axiom C \ D = ?is equivalent the PTAs Pðx 2 Djx 2 CÞ ¼ 0 and Pðx 2 Cjx 2 DÞ ¼ 0.

Logical and probabilistic consistency
We will now develop the constraints necessary to guarantee that a BKO induces a valid probability mass function.These definitions will parallel those of BKB theory.First we define mutual exclusivity and consequent boundedness in PAAs and PTAs.These definitions will be analogous to their respective concepts from BKB theory, Definitions 3.4.2and 3.4.3.Let We had already defined what it means for assertions and sets of assertions to be consistent.Since PAAs are CPRs and not sets of assertions, This definition is necessary before we can define mutual exclusivity and consequent boundedness for PTAs and PAAs.• PAAs R 1 and R 2 are mutually exclusive if disagðantðR 1 ÞÞ is inconsistent with disagðantðR 2 ÞÞ.
• PTAs T 1 and T 2 are mutually exclusive if T 1 j g 1 and T 2 j g 2 are mutually exclusive for any instantiation functions g 1 and g 2 .Recall that the instantiation of a PTA is a PAA.
• A PAA R and PTA T are mutually exclusive if there exists some instantiation function g such that R and T| g are mutually exclusive.
• PTAs T 1 and T 2 are consequent bound if T 1 j g 1 and T 2 j g 2 are consequent bound for any instantiation functions g 1 , g 2 .
• A PAA R and PTA T are consequent bound if there exists some instantiation function g such that R and T| g are consequent bound.
Notation.The negation of an assertion a 2 C is the assertion a 2 ¬C Definition 4.3.6.A Bayesian Knoweldge-driven Ontology, B, is a finite set of PAAs and PTAs such that: • For any distinct PAAs R 1 ; R 2 2 B, either (1) R 1 and R 2 are mutually exclusive or (2) conðR 1 Þ is consistent with the negation of conðR 2 Þ and conðR 2 Þ is consistent with the negation of conðR 1 Þ.
• For any distinct PTAs T 1 , T 2 2 B and instantiation functions g 1 and g 2 , either (1) T 1 j g 1 and T 2 j g 2 are mutually exclusive or (2) conðT 1 j g 1 Þ is consistent with the negation of conðT 2 j g 2 Þ and conðT 2 j g 2 Þ is consistent with the negation of conðT 1 j g 1 Þ.
• For any PAA R and PTA T in B such that conðRÞ ¼ a 2 C and conðTÞ ¼ x 2 D, and instantiation function g, either (1) R and T| g are mutually exclusive, (2) conðRÞ is consistent with the negation of con(T| g ) and con(T| g ) is consistent with the negation of conðRÞ, or (3) Tj g ¼ R • For any subset S � B where the PAAs R � S and PTAs T � S are mutually consequent bound, ∑ Q2S P(Q) � 1 Proposition 4.3.1.Any subset of a BKO is also a BKO Definition 4.3.6 has some seemingly odd conditions of a consequent's consistency with another consequent's negation.These conditions exist to prevent conflicts between CPRs which are not mutually exclusive but would generate mutex violations in rules mandated by DL.For example, if conðR 1 Þ said "a is in C", but conðR 2 Þ said "a is D", where D � C, the laws of any governing DL would require the a PAA R 3 to be inferred saying "if a is in a subset of C, then a is in C".Without the conditions set in Definition 4.3.6,R 3 could violate mutex with R 1 .The consequent consistency conditions will catch R 1 and R 2 before that inference is computed.Checking whether a set of PAAs and PTAs obeys Definition 4.3.6 requires performing O(|B| 2 ), where |B| is the number of PAAs and PTAs in the set.

BKO reasoning
Recall the purpose of BKO reasoning from the introduction: to determine the posterior probability of some event from the collection of prior and conditional probabilities that constitute our knowledge base.This section defines that process and provides an algorithm outline.

Logical reasoning under uncertainty
Before reasoning, a BKO contains both explicit restrictions on its pmf, in the form of PAAs, and implicit descriptions of its pmf, in the form of the PTAs.The probabilistic rule of universal instantiation is used to convert PTAs to PAAs that restrict the BKO's pmf.
Definition 5.1.1.An assertional axiom A is said to be provable given a set of assertional and/or terminological axioms S iff (1) A and S are expressible in a governing DL and (2) that governing DL supports a sound algorithm by which A given S may be proven.The goal of a BKO is to express all knowledge as a set of PAAs.One way to guarantee this is by instantiating each PTAs using every possible instantiation function, but this would be computationally impractical.Instead, in advance we identify the combinations of PTAs and instantiation functions that can be used in reasoning.
Intuitively, grounded PAAs are known pieces of the BKO's pmf, while ungrounded PAAs are unknown, since they have unknown antecedents.The marginal and posterior probabilities of an ungrounded PAA cannot be computed, so any descendant of that PAA also cannot be computed.
Proposition 5.1.1.Let B be a BKO and R 2 B be an ungrounded PAA.Then (1) any marginal or posterior probabilities computed using the pmf induced by B are identical to those computed using the pmf induced by B À R, and (2) any marginal or posterior probabilities which are incalculable using the pmf induced by B are also incalculable using the pmf induced by B À R.
Since ungrounded PAAs do not contribute to a BKO's pmf, we develop the following notion.Definition 5.1.6.A BKO B is fully-instantiated when, for any PTA T 2 B and instantiation function g, either T| g 2 B or T| g would not be grounded if added to B.
Note that we do not instantiate on infinite numbers of individuals or on unknown individuals.We only work with defined individuals but admit that more are possible per the openworld assumption.A fully instantiated BKO maximizes the number of its supported PAAs.Since the PTAs that could be instantiated to form supported PAAs have been, they are considered redundant in a fully instantiated BKO.However, should new information be added to the BKO, the PTAs would no longer be considered redundant until the BKO was fully instantiated again with the new information.

Mapping a BKO to an equivalent BKB
Recall that PAAs are conditional probability rules, so a set of PAAs constitute a BKB if they satisfy Definition 3.4.4.We will show that a BKO's A-box is a valid BKB.Furthermore, if a BKO is fully instantiated, no additional information can be inferred from its T-box.Combining these two insights allows us to conclude that a valid BKO can be converted to an equivalent, valid, BKB.We will then be able to use previously developed methods for BKB reasoning.Let Proof.Let R 1 and R 2 be two mutually exclusive PAAs.Then disagðantðR 1 ÞÞ and disagðantðR 2 ÞÞ are inconsistent, so there exists some p, 1 � p < n, and some q, 1 � q < m, such that i p = k q and a i p ¼ a 0 k q but C j p \ C 0 l q ¼ ?Then C j p 6 ¼ C 0 l q , so V h p ¼ V u q but their assignments are not equal.So R 1 and R 2 are CPRs that contain the same random variable in their antecedent, but they have different assignments.So R 1 and R 2 are mutually exclusive CPRs.
Proof.Let R 1 and R 2 be two consequent bound PAAs.To show that they are consequent bound CPRs we must show that (1) for all p < n and all q < m, fa i p 2 C j p g ¼ fa 0 k q 2 C 0 l q g whenever i p = k q , and ( 2) ) Since R 1 and R 2 are consequent bound PAAs, disagðantðR 1 ÞÞ and disagðantðR 2 ÞÞ are consistent.So for all p < n and q < m whenever i p = k q , C j p \ C 0 l q 6 ¼ ?.But since V i p ¼ V k q , any classes involved in their assertions are either the equal or disjoint.And since C j p \ C 0 l q 6 ¼ ?, C j p ¼ C 0 l q .So whenever i p = k q , fa i p 2 C j p g ¼ fa 0 (1) Let R 1 and R 2 be distinct elements of Abox(B).Since R 1 ; R 2 2 B, either they are mutually exclusive or conðR 1 Þ is consistent with the negation conðR 2 Þ and conðR 2 Þ is consistent with the negation of conðR 1 Þ.If the PAAs R 1 and R 2 are mutually exclusive PAAs, then by Lemma 5.2.1 they are mutually exclusive by CPRs.And if conðR 1 Þ and the negation of conðR 2 Þ are consistent (and vice versa), either a Note that an equivalent version of Theorem 5.2.1 appears in [11] as Lemma 7.1.Proposition 5.2.1.(1) For a fully instantiated BKO B, any marginal or posterior probabilities which could be calculated using the pmf induced by B are identical to those calculated using the pmf induced by Abox(B).(2) Additionally, any marginal or posterior probabilities which are incalculable using the pmf induced by Abox(B) will also be incalculable using the pmf induced by B.
Having proven that a BKO has an equivalent BKB, we will turn our attention to the question of how to generate it.

A reasoning algorithm
The Full Instantiation Algorithm will fully instantiate a BKO.To achieve this, the algorithm begins with a set of PAAs, denoted H.This set is empty by default, but it is not required to be empty.First PAAs with empty antecedents are appended to H, followed by PAAs supported by H.Then, any combination of PTA and instantiation function that will result in a PAA supported by H is also added.This process is repeated until no additional PAAs are added to H.

Complexity of the algorithm
The Full Instantiation Algorithm's complexity is driven by the instantiation of PTAs.Consider the general form of the PTA:

The product
Q nÀ 1 k¼1 jM k j is an upper bound on |S T |.So the worst case time complexity is O (M n ), where M largest number of variable assertions that are generalized to a PTA.The space complexity is also exponential, because the time complexity is driven by the number of new assertions being instantiated and is directly related to the size of the BKO.This will be true for both probabilistic and non-probabilistic assertions, because it depends on how many PAAs already in the BKO can be combined to instantiate new PAAs and not what their probability is.However, the case where jS T j ¼ Q nÀ 1 k¼1 jM k j occurs when there are no shared variable individuals between variable assertions in ant(T).Consider the antecedent of T: Assume for some variable assertions xi p 2 Ĉj p ; xi l 2 Ĉj l there exists some xi q such that xi q is included in the variable concepts Ĉj p and Ĉj l .Then the set of assertions that generalize to xi p 2 Ĉj p , denoted M * p , may include fewer assertions than the original M p .Similarly, we can denote M * l as the set of assertions that generalize to xi l 2 Ĉj l .So the number of PAAs instantiated from the Full Instantiation Algorithm is Although in this case |S T | is less than the upper bound, it still may grow exponentially with respect to the length of T's antecedent.We illustrate this with an example.Consider the PTA: Note that there is no overlap between the members of ant(T), there are no variable individuals that are shared between variable assertions in T's antecedent.Now assume for a given BKO, we have three PAAs whose generalization is x1 2 R 2 ðx 1 ; x3 Þ, four PAAs whose generalization is x4 2 R 3 ðx 4 ; x5 Þ, and three PAAs whose generalization is x6 2 R 4 ðx 6 ; x7 Þ.Then we can infer thirty-six PAAs from T. Clearly, the number of times that a PTA may be instantiated is exponential with respect to the length of its antecedent.A similar problem can be seen regarding knowledge acquisition in Bayesian Networks.One advantage that BKO theory has, in addition to handling cycles and incompleteness, is that not all combinations of PTAs are possible.This is best communicated through an example.Consider the following PTA: There could be many PAAs whose consequents are generalizations of x3 2 R 4 ðx 3 ; x2 Þ, but an instantiation function will only be valid if it maps x3 to the same specific individual as it does for x1 2 R 2 ðx 1 ; x3 Þ.So if we have we have three PAAs that are generalizations of x1 2 R 2 ðx 1 ; x3 Þ, four PAAs that are generalizations of x4 2 R 3 ðx 4 ; x2 Þ, and three PAAs that are instantiations of x3 2 R 4 ðx 3 ; x2 Þ.Then we cannot infer thirty-six PAAs as before.There will be some combinations that would require x3 to be mapped to multiple specific individuals by the same instantiation functions, which would not be valid.This can greatly reduce the number of PAAs that are instantiated.
There is one special case that represents many real-world applications and must be highlighted.Many ontologies, particularly in the biomedical domain, have terminological axioms that can be represented as PTAs of the form: In this case, the number of PAAs instantiated is equal to the number of assertions that generalize to xi 1 2 Ĉj 1 in the BKO.

Answering the probabilistic membership query
BKOs can be used to answer probabilistic membership queries (PMQs), thereby perform the probabilistic analogs of the standard DL reasoning tasks of instance and relation checking.This can be done for both fully instantiated BKOs as well as ones that are not yet fully instantiated.We rely on a BKB reasoning technique called partial belief revision.
Let B be a BKB.Let Q be a query of the form We refer to con(Q) as the reasoning target and ant(Q) as the evidence.In order to solve this with BKB theory's belief updating techniques, we must define a query rv VQ such that r(VQ) = {True, False}, and a query CPR RQ such that antðRQÞ ¼ conðQÞ and conðRQÞ Then p is computable as the belief updating problem p = P(VQ = True |ant(Q)).Intuitively, this process adds a CPR whose probability is equal to p and can be solved using belief updating.BKOs can be used to solve PMQs in a similar way.Let B be a BKO, and let Q be a probabilistic membership query of the form with probability p, such that every clause a x 2 C x is a consequent of at least one PAA in B. After B is fully instantiated, the PMQ can be solved using the same techniques just described for BKBs.This is because, as we have shown, a BKO's A-box is a valid BKB.
Previously, we answered the PMQ by first fully instantiating the BKO to a BKB and then performing partial belief revision.Suppose we would like to set ungrounded belief conditions as evidence.To do so, let B be a BKO and Q be a probabilistic membership query of the form with p the probability to compute, such that every clause a j x 2 C j x is a consequent of at least one PAA in B. Note that unlike before, the members of the antecedent of Q do not have to be a consequent of a PAA in B. Now using Q's antecedent, create a set S of PAAs {S 1 , . .., S n } such that each S k is the PAA where p k is an unspecified probability.Using S as an input initial reasoning anchor, fully instantiate B using the Full Instantiation Algorithm.Then p can be computed using BKB theory's partial relief revision.Since the members of ant(Q) are not necessarily all in B, the algorithm will build the fully instantiated BKO starting with the set S. In partial belief revision, these antecedent conditions are considered evidence, so the unspecified probabilities p k will not contribute to the result.

Knowledge fusion with BKOs
Current methods for merging ontologies require knowledge to be rejected or altered to prevent contradicting information.This section introduces BKO fusion, where reasoning can occur regardless of whether or not contradictions are present.BKO fusion eliminates the need to check for inconsistencies and remedy them through manual or automated means.Not only is all knowledge from the input ontologies retained in the fused one, but new inferences, not present in the individual ontologies, are generated.This section begins with the theoretical framework of BKOs, followed by the BKO Fusion Algorithm, and lastly a discussion on the role of ontology alignment.

Theoretical framework
BKOs leverage their close relationship to BKBs to apply Bayesian Knowledge Fusion to the problems in ontology alignment that arise when there is uncertain knowledge.The concept and formulation are both analogous to BKB fusion.Conflicting knowledge from different sources is modeled as knowledge fragments with associated relative reliability weightings.This approach allows for Bayesian inferencing about conflicting information.Note that, because BKOs are a generalization of classical ontologies, these methods apply equally to BKOs and classical ontologies.Definition 6.1.1.A source class, C s , is a class representing that knowledge came from a source s.Definition 6.1.2.A source assertion a 2 C s is an assertion indicating membership in a source class.Definition 6.1.3.A source random variable V s is a random variable such that r(V s ) is a set of source assertions.Definition 6.1.4.For a PAA R and source random variable V s , R is referred to as a sourced PAA if V s 2 antðRÞ.A PTA T is referred to as a sourced PTA if source assertion {a 2 C s } 2 ant (T) Definition 6.1.5.A BKO Fragment is a triple (B, s, w) where B is a BKO, s is a term representing the source of the knowledge contained in B, and w > 0 is a real number representing the reliability of s in comparison to other sources.
Note that a single ontology can be represented by multiple BKO Fragments.Different sources can have different reliabilities on different subsets of their domain of discourse, and those subsets are represented as fragments.A source might provide multiple fragments to a fused model, each with a different reliability weighting.
The BKO Fusion Algorithm takes two arguments.The first is a set of BKO Fragments F = {F 1 = (B 1 , s 1 , w 1 ), . .., F n = (B n , s n , w n )} such that for any fragments F i , F j 2 F, s i 6 ¼ s j .The second argument is an initial reasoning anchor H i , defaulting to the empty set.To model the source that a PAA or PTA from a BKO Fragment F came from, we include a source random variable in the antecedent of each PAA and a source assertion in the antecedent of each PTA.

BKO Fusion Algorithm
for all PAAs R i j 2 B i do 8: for all PTAs T i 2 B i do 10: The result of BKO Fusion is a valid BKO, as we will show in the following theorem.The proof depends on a crucial assumption.The definition of a BKO depends on knowing whether classes are disjoint or not.If that information about classes from different ontologies is not known, we must assume that there are no classes C i 2 C(B i ) and C j 2 C(B j ) such that C i \ C j = ?. Similarly, we also must assume that no classes such that C i \ ¬C j = ?or C j \ ¬C i = ?unless that information is provided.Such information would be included in an alignment ontology, which can be included as an input to the fusion algorithm.Theorem 6.1.1.For any two BKO fragments F i , F j 2 F such that s i 6 ¼ s j , the result of BKO Fusion will be a valid BKO.
Proof.Let F i = (B i , s i , w i ) and F j = (B j , s j , w j ) such that s i 6 ¼ s j and let ii Let T i ¼ fT i 1 ; . . .; T i n g and T j ¼ fT j 1 ; . . .; T j m g be the set of PTAs in B i and B j , respectively.Each member of T i has the source assertion a s i 2 C s i in its antecedent.Similarly, every member of T j has the source assertion a s j 2 C s j in its antecedent.Since the source assertions are not variable assertions, for any instantiation functions g 1 , g 2 , and any T i k 2 T i and T j l 2 T j , the source random variables of T i k jg i and T j l jg j will be V s ¼ fa s 2 C s i g and V s ¼ fa s 2 C s j g, respectively.And since classes from different ontologies are not disjoint, for any T i k 2 B i and T j l 2 B j , conðT i k jg i Þ is consistent with the negation of conðT j l jg j Þ and conðT j l jg j Þ is consistent with the negation of conðT i k jg i Þ.
iii Let R i ¼ fR i 1 ; . . .; R i n g be the set of PAAs in B i and T i ¼ fT i 1 ; . . .; T i m g be the set of PTAs in B i .Also, let R j ¼ fR j 1 ; . . .; R j k g be the set of PAAs in B j and T j ¼ fT j 1 ; . . .; T j l g be the set of PTAs in B j .The BKO Fusion Algorithm appends a source random variable V s i ¼ fa s i 2 C s i g to each member of R i and a source assertion a s i 2 C s i to each member of T i .Similarly, the BKO Fusion Algorithm appends a source random variable V s j ¼ fa s j 2 C s j g to each member of R j and a source assertion a s j 2 C s j to each member of T j .Because the source assertions are not variable assertions, they do not change between instantiation functions.And since no classes are disjoint across different BKOs, for any PTA or PAA Q i 2 R i [ T i and PTA or PAA Q j 2 R j [ T j , con(Q i ) will be consistent with the negation of con (Q j ) and con(Q j ) will be consistent with the negation of con(Q i ).And since source assertions are consistent with the negation of any assertion in B i [ B j , for any PAA R and PTA T in BKO B, and for any instantiation function g, either R and T| g are mutually exclusive or conðRÞ is consistent with the negation of con(T|g) and con(T|g) is consistent with the negation of conðRÞ.
iv Let S be a set of mutually consequent bound members of B. Then S cannot contain members from both B i and B j , since, as shown before, the consequents of members of B i and B j are consistent.It also does not contain R s i or R s j with any members of B i or B j since the source assertions in the consequent of R s i and R s j cannot be inconsistent with any consequents in B i or B j .So either S � B i , S � B j , or S � fR s i ; R s j g.B i and B j are valid BKOs, and we normalize the weights of R s i and R s j , so for all sets S of mutually consequent bound members of B, ∑ Q2S P(Q) � 1 So for any two BKO fragments F i , F j 2 F such that s i 6 ¼ s j , the result of fusion by the BKO Fusion Algorithm will be a valid BKO.
Since the BKO returned from this algorithm is valid, it can be used as input to the Full Instantiation Algorithm.Then, all previously established BKB reasoning techniques can be applied to it.Therefore, as described in the previous section, the fused BKO can be used to answer probabilistic membership queries.Once the BKO is fused and fully instantiated, the process is identical to the one described in the previous section.

Complexity of BKO fusion
Let F = {F 1 , . .., F n } be a set of BKO fragments.For some F i 2 F, we can write Where R i and T i are the set if PAAs and PTAs in F i , respectively.For each BKO being fused, the algorithm iterates over the set of PAAs and PTAs, which is equal to the size of each BKO Fragment: P n i¼1 jR i j þ jT i j ¼ So the complexity of the algorithm is O(nm) where n is the number of BKOs being fused and m is the number of PTAs and PAAs in the largest BKO Fragment.This is much faster than the Full Instantiation Algorithm.Although it may be necessary to run the two consecutively, first the BKO Fusion Algorithm then the Full Instantiation Algorithm, this is not always required.The Full Instantiation Algorithm is only required for reasoning over it as a BKB.Other applications of the fused ontologies can avoid that time consuming step.
It is important to note that it is not always necessary to fuse entire BKOs at one time.Often, only subsets of certain BKOs are of interest.In this case applying BKO Fusion to BKO subsets is preferred to save time.

BKO fusion and ontology alignment
When ontologies use different interpretations, their lexica must be related through some sort of mapping.This generally takes the form of an ontology dedicated to the purpose, a bridge ontology.(see [41] for recent work on this subject.)Ontology alignment has a strong need for an uncertainty formalism, because ontology interpretations are often vague, uncertain, and contentious.Even when the name of a class from one ontology is exactly the class name from another ontology, equating the two may still be incorrect if the classes are distinct or overlapping.A formal alignment ontology is necessary to avoid such issues.Ontology alignment methods exist, but are often deterministic and require that ultimately fiat decisions be made by humans or an algorithm.BKO theory is well-suited to alleviating this difficulty.It does not address the question of how to generate mappings, but it will model mappings containing uncertainty.Through fusion it permits the use of multiple dissonant mappings, each of which may themselves contain uncertainty.In such situations, formulate the ontologies to be aligned and the proposed mapping(s) each as individual BKO fragments and apply the algorithm to all the ontologies being fused and all the alignment ontologies.Every mapping used may contribute to the solution and offer up its insights.This approach also simplifies the "meta-matching problem" of how to select a method for generating and evaluating mappings (see [42] for an example of recent work on this problem).Rather than being forced to select just one alignment strategy, many strategies may be selected simultaneously and their resultant mappings fused.This eases design requirements for automated alignment generators-they no longer need to eliminate or overrule uncertainty in a candidate alignment.Conflicting results become acceptable and even desirable if they accurately reflect real-world uncertainty and disagreement.

A detailed example
With an increase in the amount of data produced in the biological sciences there has also been an increase in the use of biological ontologies, such as Gene Ontology [43,44], Human Phenotype Ontology [45], and the Infectious Disease Ontology [46].They have applications in many areas of biomedicine [47] such as data integration [48,49] and identifying protein-protein interactions [50,51].One problem that many biological researchers face is that although there are many available ontologies related to their domain, no single onotlogy adequately supports their research aims.As a result, many overlapping ontologies were developed to suit specific domains [52].For example, the Human Disease Ontology (DO) [53] covers many human diseases.However, researchers studying epilepsy needed a more detailed ontology and created the Epilepsy Ontology [54].BKO fusion can be applied to take information from separate ontologies and combine them into one.When sufficient information is available but spread out across different sources, creating an entirely new ontology in no longer necessary.This section presents a detailed example of the BKO fusion process, designed to highlight some of the unique and powerful characteristics of BKOs.We will show both how BKOs can be reasoned over despite contradictions and how new inferences can be formed as a result of fusion.
We fuse subsets of two ontologies, the Mondo Disease Ontology (MONDO) [55] and DO [53].They cover a similar domain and are both OBO Foundry [56] ontologies, but fusing them is not trivial.These are not probabilistic ontologies but can be modeled as such by assigning each statement a probability of one.Our example will be centered around the sciatic nerve, the largest nerve in the body that runs from the lower back to the lower legs.The sciatic model is a popular model for studying nerve injury, due at least in part to its accessibility during surgery [57].Although we can model any relation in either of these ontologies, we only use the "is a" relation in this example for clarity.Note that each class has a unique identifier, but we will instead use the common names to make the example easier to follow.If we need to specify which ontology the class comes from, we will add the ontology name in parentheses after the class name.For reference, Table 1 displays the common terms with their unique identifiers.We will start with the PTAs from each ontology and the bridge ontology between them.Then we will fuse them together, and finally we will reason over the resulting BKB.

Fusing two BKO Fragments
The following PTAs form a subset of MONDO: The following PTAs form a subset of DO: These can be visualized in Fig 6 .We follow the graph model for BKBs that was described in Section 3.4.Recall that the black nodes, called "S-nodes", represent conditional probabilities.The other nodes, called "I-nodes", represent random variable instantiations.The conditional probability being modeled in some S-node, q, is the probability of the I-node q points to given the I-node(s) that point to q.
Based on the figure, it looks as though "Lesion of Sciatic Nerve" has no antecedent in MONDO and "Sciatic Neuropathy" has no antecedent in DO.This is not the case as we are only displaying a subset of each ontology.We can still start reasoning without including more information from MONDO or DO by using an initial reasoning anchor.We let fV a 1 ¼ a 2 Lesion of Sciatic Nerve ðMONDOÞ; V a 2 ¼ a 2 Sciatic Neuropathy ðDOÞg be our initial reasoning anchor using some individual a and consider three BKO Fragments: F M : (B M , MONDO, 1), F D : (B D , DO, 1), F B : (B B , BRIDGE, 1).Here, we chose to set each weight to be 1.Since the algorithm normalizes the weights, their values only matter relative to each other, we could have set each weight to 2 and gotten the same result.They do not need to be equal either, but for this example we chose that they would be equal.Additionally, although not displayed in this example, multiple fragments from the same ontology could be included with different weights if desired.The fusion algorithm first adds source PAAs to the BKO and source random variables to the antecedents of each PAA or PTA in the input fragments.Graphically, this is shown in Fig 7 .Here and in the remaining figures, we represent a compressed version the edges and nodes that come from the bridge ontology in blue.This is only for clarity, an example of what these blue nodes and edges represent is shown in Fig 8 This BKO is used as an input to the Full Instantiation Algorithm.At first sight, perhaps the most noticeable aspect of the BKB is the presence of cycles.However, BKBs are uniquely equipped to handle these cycles.With a closer look, one will notice a contradiction in as well.According to MONDO, a "Lesion of Sciatic Nerve" is a "Sciatic Neuropathy".But according to DO, "Sciatic Neuropathy" is a "Lesion of Sciatic Nerve".In many ontology merging approaches either MONDO or DO would need to be prioritized in this situation, and the other's knowledge discarded.With BKOs, all knowledge from MONDO and DO can be included and reasoned about.But before we show that reasoning, we complete fusion by fully instantiating the BKO.This is both a BKO and a BKB.Should the PTAs be returned along with a set of PAAs, in would no longer be a BKB but exclusively a BKO.But in order to make use of BKB reasoning we need the output to be a BKB.

BKO reasoning
Recall the definition of an inference over a BKB.There are many such inferences in our example, we will only focus on a few.However, one could consider all of them.This would result in a list of inferences with the probability of each inference allowing for comparison between them.When there is a contradiction within the ontology, this ranking can be used to determine which, if any, is more probable.Consider the subset of the BKB in Fig 12: The probability of each inference is the product of the S-nodes in it.So here, P(a) = P(b) = 0.33.This result should be expected because we assign the same weight to each source.If we trusted one source more than another, that would be reflected in their final probability values.Rather than taking one assertion and discarding the other, we handle contradictions by returning both assertions with information on which one is more probable.
Besides handling contradictions, this example displays another strength of BKO theory.Consider the inference in Fig 13: Here we start with Sciatic Neuropathy, and through a string of "is a" relations, we end at Inflammatory Disease.What makes this inference special is that it cannot be found in either MONDO or DO.Only by combining them can we draw the connection between sciatic neuropathy and inflammatory disease.Although sciatic neuropathy is not always described as an inflammatory disease, literature shows both that sciatic neuropathy is described as a disease or damage to the sciatic nerve [59] and that sciatic nerve injury triggers an inflammatory response [60].Such insights are made possible by BKO fusion.

Conclusion
We presented a theory of representing and fusing probabilistic ontologies.This theory synthesizes the semantic expressivity and reasoning capabilities of both ontologies and BKBs without sacrificing the features, flexibility, or granularity of either.This theory depends on three key insights: (1) that disjoint classes can be mapped to a discrete random variable, (2) that generalizing DL reasoning principles to their probabilistic analogs naturally facilitates formal propagation of inheritance of probabilistic knowledge, and (3) that BKB theory and DL are matched in expressive granularity, enabling a natural synthesis founded on insights (1) and (2).Current methods for ontology merging require the resulting merged ontology to be consistent.Checking for and correcting inconsistencies is a costly process and may result in the rejection of true and useful information.BKO fusion overcomes this limitation by leveraging a BKB's reasoning capabilities.As a result, all knowledge from the input ontologies will be included in the final fused one and reasoning can occur despite conflicting information.Additionally, the fused ontology will contain emergent information not present in the input ontologies individually, a powerful feature that means the fused BKO contains more knowledge than the union of its inputs.
Having completed the fundamentals of the theory, along with an outline of the reasoning process, our next steps will focus on deepening the theory.One track involves the information gained from fusing ontologies.Using BKO fusion, any practical number of ontologies can be fused together.However, at some point little information will be added when many ontologies with overlapping domains are fused.We will describe a method to quantify how much information is being added for each additional ontology.We will also focus on ontology alignment and its application to BKO fusion.One current limitation to our approach is its dependence on the availability of accurate ontology mappings.Recently work has been done focusing on automatically generated bridge ontologies, which would be well suited for our probabilistic framework and could be used to overcome the lack of a mapping between the ontologies used in fusion.

•Definition 3 . 2 . 2 .
A role assertion that bRc for a role expression R and individuals b and c. bRc states that c is a filler of the role R for an owner b.DL commonly uses the notations R(b, c) or (b, c):R.A terminological axiom is a statement asserting a relation between two classes.Some standard forms of terminological axioms in DL are subsumption, equivalence, and disjointness axioms.For classes C and D, • A subsumption axiom is of the form C � D • An equivalence axiom is of the form C = D • A disjointness axiom is of the form C \ D = ?

Definition 4 . 1 . 1 .
For a domain Q, a finite set of individuals I, and a finite set of classes C, a lexicon L(Q) = I × C. Notation.Use the notation I(Q) and C(Q) as functions to access I and C independently.Definition 4.1

Definition 4 . 1 . 3 .
Let f : O(Q) ![0, 1] be a probability distribution for domain Q.This is known as the domain's state distribution.

Fig 7 .Fig 8 .Fig 9 .
Fig 7. Combined BKO.The three BKO framgents combined to the same graph.At this stage the fusion algorithm is not yet complete because the BKO has not been fully instantiated.The dotted nodes represent terminological knowledge.https://doi.org/10.1371/journal.pone.0296864.g007

Fig 12 .Fig 13 .
Fig 12. Handling contradictions.The two BKO fragments contradict each other.This contradiction does not present a problem when reasoning about a fused BKO.We can build two inferences, (a) from MONDO and (b) from DO from the larger BKO fragment (c).https://doi.org/10.1371/journal.pone.0296864.g012 C j 2 ; :::; a i n 2 C j n g is consistent if for all k, l 2 {1, 2, . .., n}, a i k 2 C j k and a i l 2 C j l are consistent.Individually consistent sets A 1 and A 2 are consistent with each other if A 1 [ A 2 is consistent.
The notation used to represent PAAs, R, was chosen to reflect that PAAs are CPRs, which are denoted R in BKBs.The other rule used in BKO theory is the probabilistic terminological axiom.However, its definition relies on variable individuals and variable concept constructors, so we must define those first.Definition 4.2.3.A variable individual is a variable x which represents an unspecified a 2 I(Q) We will use the term specific individual to distinguish a normal individual from a variable individual.Definition 4.2.4.A variable concept is a concept Ĉ whose members include one or more variable individuals.We will use the term specific concept to distinguish a concept from a variable concept.be a lexicon describing domain Q.A variable concept constructor is a function f ðLðQÞ; XÞ, the output of which is a variable concept.being related to some yet-unspecified individual x by both properties R 1 and R 2 .Note that variable concepts are permitted to contain some specific individuals too.Ĉ ¼ R 1 ðŷ; xÞ \ R 2 ðŷ; xÞ \ R 3 ðŷ; bÞ represents being related to yet-unspecified individual x by R 1 and R 2 , and to specific individual b by R 3 .
Definition 4.2.5.Let X ¼ f x1 ; x2 ; :::; xn g be a set of variable individuals and L(Q) Definition 4.2.6.A variable assertion is an assertion of the form ŷ 2 Ĉ where ŷ is a variable individual and Ĉ is either variable concept or specific concept.Notation.Letters with a hat ( ^) represent variable individuals or concepts, while letters without a hat ( ^) represent specific individuals, classes, roles, etc.For example, the variable concept Ĉ ¼ R 1 ðŷ; xÞ \ R 2 ðŷ; xÞ represents some variable individual ŷ Definition 4.2.7.For a set of variable individuals fx 1 ; :::; xn g and a set of variable concepts f Ĉ1 ; :::; Ĉn g a probabilistic terminological axiom (PTA) is a statement of the form
Definition 4.3.2.Let V 1 ¼ fV 1 1 ; :::; V 1 n g and V 2 ¼ fV 2 1 ; :::; V 2 mg be sets of random variables whose sample space is a set of For a BKO B, a PTA T 2 B, and an instantiation function g, infer T| g .We call this the probabilistic rule of universal instantiation.Let B be a BKO, T 2 B be a PTA, and g be an instantiation function.We will show that the finite set of PAAs and PTAs, B [ T| g satisfies the four conditions set in the definition of a BKO.BKO, condition (i) holds for all other PAAs R B 1 ; R B 2 2 B. So condition (i) holds for B [ T| g .ii Since T| g is a PAA, no PTAs were added to B. Since B is a BKO, all PTAs in B satisfy condition (ii).iii Since T 2 B, condition (ii) holds for T and all PTAs T B 2 B. So, for PAA T| g and any PTA T B 2 B, either (1) T| g and T B j g B are mutually exclusive or (2) con(T| g ) is consistent with the negation of conðT B j g B Þ and conðT B j g B Þ is consistent with the negation of con(T| g ), or (3) T B j g B ¼ Tj g .Since B is a BKO, condition (iii) holds for any other PAA R B , and PTA T B in B. So then condition (iii) holds for B [ T| g .iv Let � B [ T| g be a subset of PAAs and PTAs.Case 1: If T| g = 2 S then S � B. And since B is a BKO, ∑ Q2S P(Q) � 1. Case 2: If T| g 2 S then S − {T| g } � B, and P Q2SÀ fTj g g P(Q) � 1.But since T| g is consequent bound with all Q 2 S − {T| g }, T is also consequent bound with all Q 2 S − {T| g }.So there exists a set S − {T| g } [ T � B of mutually consequent bound PAAs and PTAs.and since B is a BKO, P Q2SÀ fTj Definition 5.1.2.For some provable rule R in the context of some BKO B, to infer R is i Since T 2 B, condition (iii) holds for T and all PAAs R B 2 B. So, for PAA T| g and any R B 2 B such that Tj g 6 ¼ R B , either (1) T| g and R B are mutually exclusive or (2) con(T| g ) is consistent the negation of conðR B Þ and conðR B Þ is consistent with the negation of con(T| g ).Since B is a g g[T P(Q) � 1.And since PTA T and PAA T| g have the same probability, P Q2SÀ fTj g g[T PðQÞ ¼ P Q2S PðQÞ � 1 So B [ T| g is a finite set of PAAs and PTAs that satisfy the conditions set in Definition 4.3.6.So B [ T| g is a BKO.
1 gÞ ¼ p 2 If R 1 and R 2 are mutually exclusive PAAs, then they are mutually exclusive CPRs.
R 2 are consequent bound CPRs.Proof.Let B be a BKO and let Abox(B) be the set of all PAAs in B. We will show that (1) for any distinct PAAs R 1 ; R 2 2 AboxðBÞ either R 1 is mutually exclusive with R 2 or conðR 1 Þ 6 ¼ conðR 2 Þ; and (2) for any subset S of mutually consequent bound CPRs of B, ∑ Q2S Notation.For a BKO B, Abox(B) represents B's A-box.Similarly, Tbox(B) represents B's Tbox.Theorem 5.2.1.Let B be a BKO.Abox(B) is a BKB.
And by Lemma 5.2.2, if R 1 and R 2 are consequent bound PAAs, they are consequent bound CPRs.So S remains unchanged and Since B is a BKO, for any subset S of mutually consequent bound PAAs of B, ∑ Q2S P(Q) � 1.
Ĉ where x is a variable individual and Ĉ is a variable concept, with each specific individual in C is replaced with a variable individual.The second is an initial reasoning anchor H i , defaulting to the empty set.The Full Instantiation Algorithm returns a BKO.
Definition 5.3.1.The generalization of assertion a 2 C, denoted gen(a 2 C) is x 2 Definition 5.3.2.Two variable assertions x1 2 f 1 ðLðQÞ; fx 1 ; :::; xw gÞ and ŷ1 2 f 2 ðLðQÞ; fŷ 1 ; :::; ŷw gÞ are equivalent if f 1 ðLðQÞ; fẑ 1 ; :::; ẑw gÞ ¼ f 2 ðLðQÞ; fẑ 1 ; :::; ẑw gÞ for any fẑ 1 ; :::; ẑw g Definition 5.3.3.An instantiation function g is compatible with PTA T if g is a one to one mapping from I(T) to a set of specific individuals.The Full Instantiation Algorithm takes two arguments.The first is a BKO B. Proposition 5.3.1.The output of the Full Instantiation Algorithm is a BKO Note that this proposition follows from Theorem 5.1.1,which states that the union of a BKO B and the instantiation of any PTA in B is still a valid BKO.
the set of variable assertions that generalize to xi k 2 Ĉj k .Let S T be the set of PAAs instantiated from PTA T. Then fR s i g [ fR s j g.Since B i and B j are a set of PAAs and PTAs, and R s i and R s j are themselves PAAs, B is a set of PAAs and PTAs.Now, we show that it sets the four conditions set in Definition 4.3.6:i Let R i ¼ fR i 1 ; . . .; R i n g and R j ¼ fR j 1 ; . . .; R j m g be the set of PAAs in B i and B j , respectively.Since we assume all classes C i and C j are disjoint, for any R i k 2 R i and R j l 2 R j , we can say that conðR i k Þ is consistent with the negation of conðR j l Þ and conðR j l Þ is consistent with the negation of conðR i k Þ.Additionally, source PAAs R s i and R s j have different individuals in their consequent that are unique to each source PAA.So (1) conðR s i Þ is consistent with the negation of conðR s j Þ and conðR s j Þ is consistent with the negation of conðR s i Þ, and (2) for any R m 2 R i [ R j , both conðR s i Þ and conðR s j Þ are consistent with the negation of conðR m Þ and conðR m Þ is consistent with the negation of both conðR s i Þ and conðR s j Þ.So, for any R 1 ; R 2 2 B, either R 1 is mutually exclusive with R 2 or conðR 1 Þ is consistent with the negation of conðR 2 Þ and conðR 2 Þ is consistent with the negation of conðR 1 Þ.