Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit

doi:10.1371/journal.pone.0174623

Fig 1.

Oscillogram with Hilbert amplitude envelope for the German word Geschichte ‘history’ (top panel) and corresponding mel scaled spectogram (lower panel).

Vertical lines represent the boundaries calculated from the minima in the Hilbert amplitude envelope. For this example, 21 FBS features are extracted for each of the three chunks of speech, resulting in a total of 63 FBS features for this word.

More »

Expand

Fig 2.

The frequency distribution of FBS features follows a power law with negative slope in the log-log plane.

More »

Expand

Fig 3.

Examples of four realizations of German und (‘and’).

Upper left: [ʊnt^h], upper right: [ʊn], lowel left: [ʊnt^h], lower right: [n].

More »

Expand

Fig 4.

Speaker accommodation as a function of the number of novel FBS features in held-out speech.

Each dot represents the increase in identification accuracy, comparing accuracy without and with training on the speech from the held-out speaker.

More »

Expand

Table 1.

Coefficients, standard errors, test statistics, and p-values for the accuracy measures (upper part) and response latencies (lower table).

More »

Expand

Fig 5.

Boxplots for the estimated by-subject coefficients for LogL1norm and LogActivation in the recognition task (upper panels) and the dictation task (lower panels).

Left: accuracy (on the logit scale); Right: latency (on the log scale). For recognition accuracy, the coefficients for LogActivation are those for the presentation over loudspeakers. For recognition latencies, the coefficients for both LogActivation and LogL1norm likewise pertain to presentation over loudspeakers.

More »

Expand

Fig 6.

Left: Distribution of weights of afferent connections of Geschichte. Right: Identification accuracy calculated across the full data set for varying degrees of pruning.

More »

Expand