Fig 1.
Absolute positional encoding method.
Fig 2.
Relative positional embedding matrix.
Fig 3.
An example of the second positional encoding with k = 1.
and
are divided into
parts respectively and spliced into d parts as the second positional encoding.
Fig 4.
Visualization of two absolute positional encoding methods.
Fig 5.
Visualization of position-wise cosine similarity of different position embeddings.
Lighter denotes the higher similarity.
Fig 6.
The similarity between the 128th positional encoding and others.
Table 1.
The dataset situation for different tasks.
Table 2.
The parameter configuration of different models.
Table 3.
Results on the WMT’14 En-De and the WMT’16 En-Ro tasks with Base model and the IWSLT’14 En-De and the IWSLT’17 En-Fr tasks with Small model.
Table 4.
Results on the WMT’14 En-De task with Deep and Big model.
Table 5.
Results of OPR on WMT’14 En-De with different hyperparameter k values.
Table 6.
BLEU scores of 5 independent runs and statistical test results for APE vs. OPR on WMT’14 En-De task with Big model.
Table 7.
Experimental results on sentences of different lengths.
Fig 7.
Perplexity comparison during training.