Table 1.
List of distributional vector space models in the present paper.
Fig 1.
Four-term analogy performances of distributional models.
Fig 2.
Analogy performance of the vocabulary-restricted word2vec models with negative sampling.
To manipulate vocabulary size, artificial corpora were generated from the co-occurrence statistics and used for training, instead of the original corpus.
Fig 3.
Learning curve and analogy performance of a word2vec model trained with back-propagation.
Fig 4.
Analogy performance of a word2vec model trained with back-propagation.
Fig 5.
A hidden Markov model generating the 24 sentences in the toy corpus.
Any hidden state other than the verbs generates the word with probability 1. For example, the state “king” generates the word “king”. On the other hand, the two hidden states for each verb generate the same. For example, both states “live (R)” and “live (C)” generate the word “live”. “BOS”: Beginning of Sentence, “EOS”: End of Sentence.
Fig 6.
The co-occurrence matrix generated with the uniform sentence probability distribution.
The numbers in the cells represent their relative frequency.
Fig 7.
(Non-)parallelepipeds embedded in the co-occurrence matrix of (a) uniform toy corpus (b) with orthogonal basis and (c) non-uniform toy corpus. (d) Parallelepiped embedded in the co-occurrence matrix of a natural corpus.
Fig 8.
The co-occurrence matrix generated with the arbitrary sentence probability distribution.
The numbers in the cells represent their probability. The row vectors are illustrated in a low dimensional subspace.