Table 1.
Features used to encode pairwise sentence similarity as a basis for the learning model.
Fig 1.
Example of parse tree and its reduced version for a sample sentence.
The parse tree represents the syntactic structure of a sentence in the form of a rooted tree. The reduced form retains only the major groups of part of speech tags—i.e., NPs and VPs.
Fig 2.
Reduced parse trees of the two sample sentences (i.e. Outcome A and B) listed in the Introduction.
Fig 3.
Example of role-based semantic similarity measure for two sample sentences.
Both measures are computed using Eq 7, with the actual similarity being specific to pre-verb component (as defined in Eq 8) and predicates (as defined in Eq 9).
Table 2.
Statistics on the SICK corpus [9].
Table 3.
The statistics of the NICTA-PIBOSO corpus.
Table 4.
Evaluation of regression algorithms on 10-fold cross-validation on the SICK training corpus.
Table 5.
Analysis of effects of different similarity measures—Pearson Correlation results for 10-fold cross-validation using Leave-one-Out feature strategy (i.e. the model is trained on all features except the one mentioned in each row) and results for each measure individually (i.e. the model is trained only for the mentioned feature).
Table 6.
Inter-annotator agreement.
Table 7.
Evaluation of semantic similarity approach over EBM scientific artefacts.
Table 8.
Analysis of effects of different similarity measures when the model is trained only on the mentioned features.
Fig 4.
The error distribution of the ensemble predictions on SICK data.
Table 9.
Prediction errors from the ensemble model.
Table 10.
Evaluation of the ensemble model on the test set, split onto the four score ranges.
Table 11.
Comparative overview of the features used by existing systems.
Table 12.
Experimental results achieved by our approach in comparison to the state of the art.