Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

doi:10.1371/journal.pone.0246751

Fig 1.

System overview.

Main steps in Thai semantic similarity conducted in previous work (yellow background), and this work (blue background).

More »

Expand

Fig 2.

First four lines of TH-SemEval-500.

More »

Expand

Table 1.

Overview of Thai semantic similarity datasets, including number of word pairs, human inter-annotator agreement, and rating interval.

More »

Expand

Table 2.

Evaluation metrics Spearman ρ (S), Pearson ρ (P) and Harmonic Mean (HM) of the two–for the self-trained models and the pretrained baselines.

Further, the ratio of OOV words (%OOV).

More »

Expand

Table 3.

Evaluation metrics Spearman ρ (S), Pearson ρ (P) and Harmonic Mean (HM) of the two–for the self-trained models and the pretrained baselines, with deepcut applied to the datasets terms.

Further, the ratio of OOV words (%OOV).

More »

Expand

Table 4.

Overview of results for BPEmb (various settings), fastText embeddings, and stacked embeddings; with comparison to the baselines.

More »

Expand

Table 5.

Overview of results for combining subword embeddings with structured and hybrid sources (WordNet and ConceptNet Numberbatch).

M1 refers to Method 1 from Section Implementation, M2 to Method 2.

More »

Expand

Fig 3.

Overview of results.

Comparing the baseline from previous work (Baseline: thai2vec) with the various approaches implemented in this work.

More »

Expand