Table 1.
Technical setup for the recording of the audio-visual streams.
Table 2.
Number of subjects having special health issues.
Table 3.
Chosen food classes and amount of food served to the subjects while recording each utterance.
Table 4.
Self-reporting on likability and difficulty of eating of food classes rated by all subjects.
Table 5.
Statistics of the iHEARu-EAT database.
Fig 1.
Exemplary subjects of the iHEARu-EAT database while recording an utterance without eating food (left), eating a banana (middle) and eating crisps (right).
Unusual configurations of the supra-glottal part of the vocal tract are clearly visible for the eating conditions.
Table 6.
ASR WERs [%] using 7-way acoustic model training on the iHEARu-EAT dataset.
Table 7.
ComParE acoustic feature set: 65 low-level descriptors (LLD).
Table 8.
ComParE acoustic feature set: Functionals applied to LLD contours (Table 7).
Table 9.
Binary classification of eating condition.
Table 10.
2-way and 7-way classification of eating condition.
Table 11.
Confusion matrix obtained by SVMs on the ComParE feature set in the 7-way classification of eating condition for both read and spontaneous speech production.
Fig 2.
Solutions of non-metric dimensional scaling applied to class confusions (2-D (top), 1-D (bottom left)) or Euclidean class center distances (1-D (bottom right)) in the 7-way task, ComParE low-level acoustic features.
Fig 3.
Degree of high-frequency noise of the words ‘warmed up’ caused by eating.
Subjects (left: female, right: male) while recording an utterance eating a banana (top), without eating a sort of food (middle), and eating crisps (bottom).
Table 12.
Regression-based recognition of eating condition.