Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Regional NER examples along with Standard Bangla and English.

More »

Fig 1 Expand

Fig 2.

Development of ANCHOLIK-NER: A systematic pipeline for dataset creation.

More »

Fig 2 Expand

Table 1.

Distribution of sentences across different data sources for Bangla regional dialects in the ANCHOLIK-NER dataset.

More »

Table 1 Expand

Table 2.

Sentence structure conversion by separating punctuation.

More »

Table 2 Expand

Table 3.

Dataset structure for Sylhet region after pre-processing and tokenization phase (Followed for all 5 regions).

More »

Table 3 Expand

Table 4.

Comprehensive overview of annotators’ background and expertise.

More »

Table 4 Expand

Table 5.

BIO Tagging scheme with examples for named entity recognition in Bangla regional dialects.

More »

Table 5 Expand

Fig 3.

Inter-annotator agreement (Cohen’s Kappa) across different regions.

More »

Fig 3 Expand

Fig 4.

Average tagging speed (time per 1000 tokens) by region in minutes.

More »

Fig 4 Expand

Table 6.

Dataset consists of 3 columns for each region, with the first two generated by a Python script and the third (BIO-Tags) verified by Bangla Regional Language experts.

More »

Table 6 Expand

Table 7.

Overview of our proposed dataset.

More »

Table 7 Expand

Fig 5.

Chittagong.

More »

Fig 5 Expand

Fig 6.

Sylhet.

More »

Fig 6 Expand

Fig 7.

Barishal.

More »

Fig 7 Expand

Fig 8.

Noakhali.

More »

Fig 8 Expand

Fig 9.

Mymensingh.

More »

Fig 9 Expand

Table 8.

Total instances of named entity types in five regions.

More »

Table 8 Expand

Fig 10.

Frequency of named entities Chittagong dialects.

More »

Fig 10 Expand

Fig 11.

Frequency of named entities Barishal dialects.

More »

Fig 11 Expand

Fig 12.

Frequency of named entities Mymensingh dialects.

More »

Fig 12 Expand

Fig 13.

Frequency of named entities Sylhet dialects.

More »

Fig 13 Expand

Fig 14.

Frequency of named entities Noakhali dialects.

More »

Fig 14 Expand

Fig 15.

Methodology.

More »

Fig 15 Expand

Table 9.

Performance of Bangla BERT.

More »

Table 9 Expand

Table 10.

Performance of Bangla Bert base.

More »

Table 10 Expand

Table 11.

Performance of BERT base multilingual cased.

More »

Table 11 Expand

Fig 16.

Confusion matrices for the best performing model across Barishal regional dialect.

More »

Fig 16 Expand

Fig 17.

Confusion matrices for the best performing model across Mymensingh regional dialect.

More »

Fig 17 Expand

Fig 18.

Confusion matrices for the best performing model across Chittagong regional dialect.

More »

Fig 18 Expand

Fig 19.

Confusion matrices for the best performing model across Noakhali regional dialect.

More »

Fig 19 Expand

Fig 20.

Confusion matrices for the best performing model across Sylhet regional dialect.

More »

Fig 20 Expand

Table 12.

Entity-wise F1-scores across the five dialect regions for the weighted-loss on Bangla BERT model.

More »

Table 12 Expand