Fig 1.
Main steps in Thai semantic similarity conducted in previous work (yellow background), and this work (blue background).
Fig 2.
First four lines of TH-SemEval-500.
Table 1.
Overview of Thai semantic similarity datasets, including number of word pairs, human inter-annotator agreement, and rating interval.
Table 2.
Evaluation metrics Spearman ρ (S), Pearson ρ (P) and Harmonic Mean (HM) of the two–for the self-trained models and the pretrained baselines.
Further, the ratio of OOV words (%OOV).
Table 3.
Evaluation metrics Spearman ρ (S), Pearson ρ (P) and Harmonic Mean (HM) of the two–for the self-trained models and the pretrained baselines, with deepcut applied to the datasets terms.
Further, the ratio of OOV words (%OOV).
Table 4.
Overview of results for BPEmb (various settings), fastText embeddings, and stacked embeddings; with comparison to the baselines.
Table 5.
Overview of results for combining subword embeddings with structured and hybrid sources (WordNet and ConceptNet Numberbatch).
M1 refers to Method 1 from Section Implementation, M2 to Method 2.
Fig 3.
Comparing the baseline from previous work (Baseline: thai2vec) with the various approaches implemented in this work.