Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Deep neural network (DNN).

This is a graphical representation of a standard feedforward DNN architecture. The DNN is fed with an input vector x of dimension D, which is transformed by the hidden layers hj (composed of Nj hidden units) according to a function g and the parameters of the DNN (weights matrices W and bias vectors b). Finally, the output layer O provides the output of the DNN for the target task (for the case of classification, the probability of an input vector to belong to each class C).

More »

Fig 1 Expand

Fig 2.

Representation of language recognition system structure.

This is a graphical representation the language recognition systems, both the reference (cepstral feature based system) and the bottleneck feature based system.

More »

Fig 2 Expand

Fig 3.

Example of DNN architecture with bottleneck layer.

This is a graphical representation of the topology of a DNN with a BN layer, whose outputs (activation values) are used as input feature vectors for the language recognition system.

More »

Fig 3 Expand

Table 1.

Cluster of target languages and approximate amount of data per language in the NIST LRE 2015 training dataset.

More »

Table 1 Expand

Table 2.

Datasets used for training and testing our systems.

More »

Table 2 Expand

Table 3.

Cepstral based i-vector reference system (i-vector based on MFCC-SDC features) performance, average EER of all language clusters.

More »

Table 3 Expand

Table 4.

DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).

More »

Table 4 Expand

Fig 4.

Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) with different number of hidden layers of the DNN.

More »

Fig 4 Expand

Table 5.

DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).

More »

Table 5 Expand

Fig 5.

Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) when the bottleneck layer moves from first to fourth layer in a four hidden layer topology.

More »

Fig 5 Expand

Table 6.

DNN (phoneme classification, frame accuracy) and language recognition performance (average EER of all language clusters).

More »

Table 6 Expand

Fig 6.

Phoneme frame accuracy of DNN (upper part of the figure) and language recognition systems (lower part) for different test durations (3, 10 and 30s) when the bottleneck layer size (number of hidden units) varies.

More »

Fig 6 Expand

Fig 7.

Test duration segments histogram of the mismatched test dataset (the evaluation data of LRE 2015).

More »

Fig 7 Expand

Table 7.

Language recognition performance (average EER of all clusters) for the evaluation data of NIST LRE 2015.

More »

Table 7 Expand

Fig 8.

Evaluation data results.

This figure shows the performance per cluster and on average for the cepstral based i-vector reference system and the bottleneck feature based language recognition system, for the best configuration found on the development and over the actual evaluation data of LRE’15. This configuration was 80-dimensional bottleneck features from third hidden layer in a four hidden layer DNN architecture.

More »

Fig 8 Expand