Creating and validating the Fine-Grained Question Subjectivity Dataset (FQSD): A new benchmark for enhanced automatic subjective question answering systems

doi:10.1371/journal.pone.0301696

Table 1.

Samples of various question types.

More »

Expand

Table 2.

Statistical information of FQSD dataset.

More »

Expand

Table 3.

Descriptive metrics for FQSD dataset.

More »

Expand

Table 4.

Pearson correlation among annotators.

More »

Expand

Table 5.

Interpretation ranges for Fleiss’s Kappa.

More »

Expand

Fig 1.

Top 30 TF-IDF scores of nouns and noun phrases in FQSD.

More »

Expand

Fig 2.

Top 20 TF-IDF Scores of Adverbs/Adjectives in FQSD Categorized by Subjectivity Comparison-Form Classes: a) CO b) CS c) SO d) SS.

More »

Expand

Fig 3.

Histograms of TF-IDF Adverbs/Adjectives Scores in FQSD Categorized by Subjectivity Comparison-Form Classes: a) CO b) CS c) SO d) SS.

More »

Expand

Table 6.

Statistical information of Yu et al. [7] dataset.

More »

Expand

Table 7.

Statistical information of ConvEx-DS dataset [9].

More »

Expand

Table 8.

Statistical information of SubjQA dataset [8].

More »

Expand

Fig 4.

Distribution of total question count and multi-sentence question count across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the size and structural analysis of each dataset.

More »

Expand

Fig 5.

Distribution of the total word count and unique word count across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the lexical richness of each dataset.

More »

Expand

Fig 6.

Distribution of the average words per sentence, average sentence length, average word length, and average syllables per word across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the linguistic complexity of each dataset.

More »

Expand

Fig 7.

Visualizing the average parse tree depth across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the syntactic complexity of each dataset.

More »

Expand

Fig 8.

Visualizing the Mean Dependency Distance (MDD) across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the dependency analysis of each dataset.

More »

Expand

Fig 9.

Visualizing the Root Type-Token Ratio (RTTR) and the Corrected Type-Token Ratio (CTTR) across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the lexical diversity of each dataset.

More »

Expand

Fig 10.

Visualizing the sparsity degree and total question count across the FQSD, ConvEx-DS, Yu et al., 2012, and SubjQA datasets, showcasing the data sparsity of each dataset.

More »

Expand

Table 9.

RoBERTa’s five-fold cross-validation evaluation on FQSD.

More »

Expand

Fig 11.

LIME visualizations of word influence in model’s predictions for Instances 1 (Fig 11a), and 2 (Fig 11b) on the Yu et al. [7] dataset.

More »

Expand

Table 10.

Model performance across different dataset sizes (averaged over 5 runs using stratified 5-fold cross-validation).

More »

Expand

Table 11.

Analysis of the proposed subjectivity classification model over five separate runs on the SUBJQA dataset [8].

More »

Expand

Table 12.

Analysis of the proposed subjectivity-comparison form classification model over five separate runs on the ConvEx-DS dataset [9].

More »

Expand

Table 13.

Evaluation of the proposed model (Trained on FQSD and tested on Yu et al. [7] dataset) vs. Yu et al. [7] model.

More »

Expand

Table 14.

Comparative analysis of transformer models’ performance (LR = 1e-5) on the FQSC task across the FQSD dataset over five independent runs.

More »

Expand

Table 15.

Comparative analysis of transformer models’ performance (LR = 1e-5) on the subjectivity-comparison form classification task across the ConvEx-DS dataset [9] over five independent runs.

More »

Expand

Table 16.

Comparative analysis of transformer models’ performance (LR = 3e-5) on the subjectivity classification task across the SubjQA dataset [8] over five independent runs.

More »

Expand

Table 17.

Comparative analysis of transformer models’ performance on fine-grained subjectivity tasks across multiple datasets over five independent runs.

More »

Expand