Fig 1.
This is a graphical representation of a standard feedforward DNN architecture. The DNN is fed with an input vector x of dimension D, which is transformed by the hidden layers hj (composed of Nj hidden units) according to a function g and the parameters of the DNN (weights matrices W and bias vectors b). Finally, the output layer O provides the output of the DNN for the target task (for the case of classification, the probability of an input vector to belong to each class C).
Fig 2.
Representation of language recognition system structure.
This is a graphical representation the language recognition systems, both the reference (cepstral feature based system) and the bottleneck feature based system.
Fig 3.
Example of DNN architecture with bottleneck layer.
This is a graphical representation of the topology of a DNN with a BN layer, whose outputs (activation values) are used as input feature vectors for the language recognition system.
Table 1.
Cluster of target languages and approximate amount of data per language in the NIST LRE 2015 training dataset.
Table 2.
Datasets used for training and testing our systems.
Table 3.
Cepstral based i-vector reference system (i-vector based on MFCC-SDC features) performance, average EER of all language clusters.
Table 4.
DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).
Fig 4.
Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) with different number of hidden layers of the DNN.
Table 5.
DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).
Fig 5.
Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) when the bottleneck layer moves from first to fourth layer in a four hidden layer topology.
Table 6.
DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).
Fig 6.
Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) when the bottleneck layer size (number of hidden units) varies.
Fig 7.
Test duration segments histogram of the mismatched test dataset (the evaluation data of LRE 2015).
Table 7.
Language recognition performance (average EER of all clusters) for the evaluation data of NIST LRE 2015.
Fig 8.
This figure shows the performance per cluster and on average for the cepstral based i-vector reference system and the bottleneck feature based language recognition system, for the best configuration found on the development and over the actual evaluation data of LRE’15. This configuration was 80-dimensional bottleneck features from third hidden layer in a four hidden layer DNN architecture.