The authors have declared that no competing interests exist.
Biomedical knowledge claims are often expressed as hypotheses, speculations, or opinions, rather than explicit facts (propositions). Much biomedical text mining has focused on extracting propositions from biomedical literature. One such system is SemRep, which extracts propositional content in the form of subject-predicate-object triples called predications. In this study, we investigated the feasibility of assessing the factuality level of SemRep predications to provide more nuanced distinctions between predications for downstream applications. We annotated semantic predications extracted from 500 PubMed abstracts with seven factuality values (
With the exponential increase in the number of biomedical publications, managing the literature efficiently to support hypothesis generation and discovery has become a daunting task. Text mining from the literature has been proposed to address this challenge [
SemRep [
(a)
(b) C0078056:Vascular Cell Adhesion Molecule-1
-DISRUPTS-
C0730345:Microalbuminuria
SemRep relies on the UMLS SPECIALIST Lexicon [
As Example (1) above illustrates, SemRep predications may not fully capture the meaning of the source sentence. It focuses on propositional meaning (the claim that
While not as widely studied as more foundational tasks like named entity recognition or relation extraction, in the last decade, there has been some research focusing on extra-propositional meaning in biomedical research literature. Extra-propositional phenomena have been annotated in various corpora. For example, the GENIA event corpus [
Shared task competitions have provided stimulus in automatic recognition of certain extra-propositional phenomena. BioNLP shared tasks on event extraction [
The studies that focus on extra-propositional meaning mentioned so far assign discrete values to propositional content (e.g., certainty level or polarity of an event). While these values can be useful for downstream applications, a potentially more useful notion is
In this study, we focus on the factuality of SemRep predications. We present an annotated corpus in which we annotated SemRep predications extracted from 500 PubMed abstracts with their factuality values. We extend our earlier work [
In this section, we first discuss our corpus and the annotation study. We then describe two approaches to factuality assessment of SemRep predications, the first a rule-based, compositional approach and the second a machine learning-based method. We conclude this section by briefly discussing the evaluation methodology.
For training and testing, we used a corpus of 500 PubMed abstracts. These were randomly selected from a larger corpus of approximately 45,000 abstracts used in another study (as of yet unpublished). The selection criteria for the abstracts in the larger corpus was that SemRep extracted at least one predication from them, in which the object argument was one of a small number of disorders, including Alzheimer’s disease, asthma, myocardial infarction, obesity, and Parkinson disease, and the predicate type was one of CAUSES, PREVENTS, PREDISPOSES, and TREATS.
We annotated SemRep predications extracted from these abstracts with seven discrete factuality levels:
Sentence | Predication | Factuality |
---|---|---|
Nifedipine-AUGMENTS-Renal Blood Flow | ||
Tamoxifen-PREDISPOSES-Endometrial Carcinoma | ||
Estrogen Antagonists-TREATS-Malignant neoplasm of cancer | ||
Leukotriene Antagonists-TREATS-Hay fever | ||
Losartan-CAUSES-Coughing | ||
Plasmapheresis-TREATS-Collagen Diseases | ||
Cyclic AMP-STIMULATES-CD40 Ligand |
For each predication extracted by SemRep, the annotators performed the following two steps:
If the extracted predication is a false positive, mark it as such (i.e., no further factuality annotation is needed).
Otherwise (true positive), mark the factuality level of the predication using one of the seven factuality values.
Initially, all predications were automatically marked as
We experimented with two approaches to assess factuality of SemRep predications. The first approach is an enhancement of the compositional approach reported in earlier work [
The compositional approach is based on the Embedding Framework [
An enhanced predication representation
A domain-independent categorization of embedding phenomena
A dictionary of embedding predicates
A predication composition procedure that uses syntactic dependency relations and the components above to extract enhanced predications
The framework can incorporate conceptual and propositional information in the form of named entities and relations extracted by third-party tools, as long as the textual mentions of relevant terms and predicates are provided. SemRep, with its explicit marking of entities and predicates, therefore, readily provides the semantic information that can be integrated into the framework. Enhanced predications generated by the predication composition procedure can be used to address various practical tasks. Factuality assessment is one such task and is achieved by rules that map each enhanced predication to a distinct factuality level.
We illustrate the components of the framework on an example, shown in
(1) | |
(2) | |
(3) | C0020740:Ibuprofen-TREATS-C0006142:Malignant neoplasm of breast |
(4) |
In row (2), UMLS Metathesaurus concepts corresponding to entities are represented as
In the example in
A domain-independent embedding categorization and a dictionary in which embedding predicates, such as
Predication composition also relies on lexical information (tokens, lemmas, part-of-speech) and syntactic dependencies provided by the Stanford CoreNLP toolkit [
The generation of the semantic graph is followed by bottom-up traversal of the graph for predication composition. Two procedures in predication composition that play roles in factuality assessment are:
Each predication is initially placed on the epistemic scale and assigned the modality value of 1 (i.e.,
Lemma |
||
---|---|---|
Sense.1 | Category |
|
Prior scalar modality value | 0.5 | |
Semantic dependency types | ||
Sense.2 | Category |
|
Prior scalar modality value | 0.6 | |
Semantic dependency types |
To use the compositional approach with SemRep predications, we make several enhancements in the composition procedure. First, we prune the embedding predicate annotations of the system that are subsumed by or overlap with named entities or predicates identified by SemRep, in order to avoid using them for factuality assessment and to give precedence to the semantics generated by SemRep. For example,
Second, the handling of negated predication arguments is enhanced. Negation triggers, such as
Thalidomide-INHIBITS-TNF-alpha
Third, we add a semantic graph path constraint between the embedding predicate and the predicate indicating the SemRep predication. This constraint applies when the embedding predicate is an adverb or a modal auxiliary, and stipulates that for the SemRep predication to be in the scope of the embedding predicate, no verbal node can be on the path between the embedding predicate and the SemRep predicate, unless the verbal node corresponds to a light verb, such as
Pramipexole-AFFECTS-Tremor
Once the enhanced predications are composed, we use simple rules to assign factuality levels to them. In our previous work, these rules were essentially based on scalar modality values associated with predications. In the current study, based on the analysis of the training examples, these were slightly modified, with new rules added for
All ISA predications are assigned the factuality value of
All predications indicated by modifier-head constructions or prepositional predicates are assigned the value of
All inferred predications (INFER) inherit the factuality value of their non-inferred counterpart.
The first two rules aim to address predominantly factual static relations (rather than events).
The modified, scalar modality value-based rules are shown in
Condition | Factuality value |
---|---|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
MV |
|
Note that only the enhanced predications corresponding to SemRep predications are mapped. All other predications are pruned at this step, since their factuality values are not of interest for this study.
As an alternative to the compositional approach, we experimented with a machine learning method. We cast factuality prediction as a multi-label classification task. LIBLINEAR implementation of linear SVM [
We experimented with two sets of features. The first set consists of features used by Miwa et al. [
While we aimed to replicate the experimental setting of Miwa et al. [
The second set of features include a subset of the features above as well as several additional features, some incorporating SemRep-specific information and others that were expected to be helpful for the classification task. The features that are shared with those of Miwa et al. [
Trigger-argument pair unigram and bigram features
Sentence relative position feature
The additional features used in the second set are the following:
Feature indicating whether the predication sentence is in the title of the abstract.
Predicate type feature (TREATS, ISA, PROCESS_OF, etc.)
Indicator type feature (whether it is a verb, noun, preposition, etc.)
We assessed our methodology on the annotated factuality corpus. We used a simple majority baseline, which indicates that all predications have the factuality level of
Certainty Level | Polarity | Factuality |
---|---|---|
L3 | Positive |
|
L2 | Positive |
|
L1 | Positive |
|
L1 OR L2 | Negative |
|
L3 | Negative |
|
We also evaluated the enhanced compositional factuality assessment method, as well as the supervised machine learning method with two sets of features. We used precision, recall, and F1 score as evaluation metrics for individual factuality levels, and accuracy as the metric for overall factuality assessment.
In this section, we present the results of the annotation study as well as those of rule-based and machine-learning approaches to factuality prediction. We conclude the section by providing an error analysis and discussing some negative results.
The statistics regarding the corpus are given in
# Training (%) | # Testing (%) | # Total (%) | |
---|---|---|---|
Abstracts | 300 | 200 | 500 |
SemRep predications | 4,431 | 2,960 | 7,391 |
True positive SemRep predications | 3,149 (71.1) | 2,179 (73.6) | 5,328 (72.1) |
|
2,754 (87.5) | 1,958 (89.9) | 4,713 (88.4) |
|
143 (4.5) | 67 (3.0) | 210 (4.0) |
|
66 (2.1) | 61 (2.8) | 127 (2.4) |
|
8 (0.3) | 6 (0.3) | 14 (0.3) |
|
57 (1.8) | 35 (1.6) | 92 (1.7) |
|
120 (3.8) | 52 (2.4) | 172 (3.2) |
|
1 (0.0) | 0 (0.0) | 1 (0.0) |
As the statistics in
Evaluation results for factuality assessment are presented in
Precision (%) | Recall (%) | F1 (%) | Accuracy (%) | |
---|---|---|---|---|
89.9 | ||||
|
89.9 | 100.0 | 94.7 | |
86.7 | ||||
95.6 | 91.2 | 93.4 | ||
29.6 | 79.1 | 43.1 | ||
37.6 | 67.2 | 48.2 | ||
0.0 | 0.0 | 0.0 | ||
34.8 | 22.9 | 27.6 | ||
0.0 | 0.0 | 0.0 | ||
95.6 | 98.8 | 97.2 | ||
66.7 | 71.6 | 69.1 | ||
86.8 | 54.1 | 66.7 | ||
100.0 | 33.3 | 50.0 | ||
100.0 | 57.1 | 72.7 | ||
95.5 | 40.4 | 56.8 | ||
89.8 | ||||
90.4 | 99.5 | 94.7 | ||
50.0 | 5.9 | 10.5 | ||
40.0 | 3.3 | 6.1 | ||
0.0 | 0.0 | 0.0 | ||
100.0 | 2.9 | 5.6 | ||
20.0 | 3.9 | 6.6 | ||
92.9 | ||||
94.5 | 99.0 | 96.7 | ||
57.4 | 51.5 | 54.3 | ||
80.0 | 45.9 | 58.3 | ||
0.0 | 0.0 | 0.0 | ||
88.9 | 45.7 | 60.4 | ||
46.7 | 13.7 | 21.2 |
The system performance based on supervised machine learning is also shown in
We performed an ablation study for the enhanced compositional approach by removing each of the three enhancements discussed above and measuring the system performance. These results, presented in
Precision (%) | Recall (%) | F1 (%) | Accuracy (%) | |
---|---|---|---|---|
95.6 | 98.8 | 97.2 | ||
66.7 | 71.6 | 69.1 | ||
86.8 | 54.1 | 66.7 | ||
100.0 | 33.3 | 50.0 | ||
100.0 | 57.1 | 72.7 | ||
95.5 | 40.4 | 56.8 | ||
94.3 | ||||
95.6 | 98.6 | 97.1 | ||
66.2 | 70.2 | 68.1 | ||
86.8 | 54.1 | 66.7 | ||
66.7 | 33.3 | 44.4 | ||
90.9 | 57.1 | 70.2 | ||
87.5 | 40.4 | 55.3 | ||
94.2 | ||||
95.4 | 98.8 | 97.1 | ||
66.7 | 71.6 | 69.1 | ||
80.5 | 54.1 | 64.7 | ||
100.0 | 33.3 | 50.0 | ||
100.0 | 40.0 | 57.2 | ||
95.5 | 40.4 | 56.8 | ||
93.5 | ||||
95.9 | 97.7 | 96.8 | ||
52.8 | 71.6 | 60.8 | ||
72.9 | 57.4 | 64.2 | ||
100.0 | 33.3 | 50.0 | ||
87.5 | 60.0 | 71.2 | ||
95.3 | 38.5 | 54.8 |
We attempted to improve the machine learning approach in various ways. For example, to address the imbalance in the dataset, we oversampled the instances labeled with non-
In summary, for assigning factuality values to SemRep predications, improving over a trivial baseline is quite challenging, as indicated by the results obtained with the classifier that incorporates features from Miwa et al. [
We analyzed and categorized the errors made by the enhanced compositional method. The distribution of these errors is shown in
Category | % |
---|---|
Factuality triggers | 29.9 |
Mapping rules | 27.7 |
Argument identification | 16.8 |
Scalar modality value composition | 10.9 |
Preprocessing | 5.8 |
Graph transformation | 5.1 |
Comparative structures | 2.2 |
Syntactic parsing | 1.5 |
The most frequent type of error involves presence/absence of factuality triggers. A typical example is given in Example (4), in which no factuality trigger is present, leading the system to assign it the value
Kinin-ASSOCIATED_WITH-Chronic heart failure
Errors involving the mapping rules constitute the second largest class of errors. One mapping rule simply assigns the factuality value
Antihypertensive therapy-TREATS-Cerebrovascular accident
Argument identification errors were often caused by the lack of an appropriate semantic dependency type for the factuality trigger in the embedding dictionary. In Example 6, the semantic dependency that exists between the counterfactual trigger
Antidiabetics-CAUSES-Liver diseases
We presented an annotated corpus that focuses on factuality of semantic relations, extracted by SemRep, and experimented with two approaches, one an extension of an earlier rule-based approach and the other a machine learning approach, to assign factuality values to SemRep predications. The compositional, rule-based approach yielded better performance, although there is room for improvement. While only the factuality of SemRep predications was considered in this study, the system can accommodate any semantic relation extraction system that marks relation triggers and term mentions.
There are several limitations to the rule-based approach, which we plan to address in future work. First, the transformation rules that convert syntactic dependencies to a semantic graph are not exhaustive. It may be possible to automatically learn such transformations more generally using the recently available linguistic graphbanks for semantic parsing [
The resulting annotated corpus is publicly available in standoff annotation format at
The authors thank Dongwook Shin for his assistance with