Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Data construction pipeline for the ViSQA dataset.

More »

Fig 1 Expand

Table 1.

Transcription accuracy on clean audio using Google Speech-to-Text.

More »

Table 1 Expand

Table 2.

Word Error Rate (WER) under noisy conditions.

More »

Table 2 Expand

Table 3.

A qualitative sample from the ViSQA dataset.

The example shows the passage, question, gold answer, ASR transcripts from clean and noisy audio (with ASR errors highlighted in bold), and whether the gold span was successfully re-aligned.

More »

Table 3 Expand

Table 4.

Comparison of dataset characteristics across UIT-ViQuAD, ViNewsQA, VlogQA, and ViSQA.

More »

Table 4 Expand

Fig 2.

Overview diagram of the SQA framework.

More »

Fig 2 Expand

Table 5.

Comprehensive evaluation of state-of-the-art models demonstrating performance degradation across spoken data conditions.

All models were trained on the complete UIT-ViQuAD training dataset. The UIT-ViQuAD-dev and ViSQA-test represent the testing sets of UIT-ViQuAD and ViSQA, respectively.

More »

Table 5 Expand

Table 6.

Performance comparison of models on ViSQA-dev set.

“Text” indicates models trained on clean text documents (UIT-ViQuAD), while “Spoken” refers to models trained on ASR transcriptions (ViSQA).

More »

Table 6 Expand

Table 7.

Evaluation results of pre-trained language models on the ViSQA test set under clean and noisy conditions.

All models were trained on the same ViSQA training set.

More »

Table 7 Expand

Table 8.

WER of transcribed ViSQA context by Google STT and AssemblyAI.

More »

Table 8 Expand

Table 9.

Performance comparison of machine comprehension models trained on Google vs. Assembly transcripts, evaluated under matched ASR conditions.

More »

Table 9 Expand

Table 10.

Mean and standard deviation of model performance on the ViSQA-test set, averaged over 5 re-training runs with different random seeds.

More »

Table 10 Expand

Table 11.

Pairwise p-values from paired t-tests on EM and F1 scores respectively between models (scaled by 10−4, rounded to 1 decimal).

More »

Table 11 Expand

Table 12.

The count and accuracy rate of correct answers on the ViSQA test set, categorized by type, measured using the EM metric.

More »

Table 12 Expand