Fig 1.
The news articles dataset pre-processing steps are illustrated.
Table 1.
A summary of Disaster-related Entity class names including a brief description and the source ontology is listed.
Fig 2.
A bar plot showing the number of 14 Disaster-related entity types (in hundreds) in the news articles dataset.
Fig 3.
A pie diagram illustrating the percentage distribution of 14 Disaster-related entity types in the news articles dataset.
Table 2.
An example word tokenization and POS tagging of a training sentence.
Fig 4.
The word embdedding model development process explained.
| V | denotes the cardinal of the vocabulary (i.e. the different words in the corpus) and N denotes the dimension of the embedding vectors (i.e. the hidden layer has N dimensions) where C represents the context window’s size.
Fig 5.
The histograms of the word embedding vector distributions are illustrated.
A. Contextual embedding B. Word2vec embedding.
Fig 6.
The illustration of the character embedding model.
Fig 7.
An illustration of the proposed BiLSTM-ATTN-CRF model architecture for disaster-related named entity recognition task.
Table 3.
Number of sentences, words, and characters in Training and Test data.
Table 4.
Distribution of major disaster-related named entities in training and test data.
Table 5.
Example of tokenization and labeling in a sample sentence.
Table 6.
The list of Hyper-parameters with corresponding ranges and optimal values.
Table 7.
Results on test data for different model configurations.
Table 8.
BiLSTM-ATTN-CRF model performances using different word-level embeddings.
Table 9.
BiLSTM-ATTN-CRF performances with various feature combinations. In column 1, “All" indicates the combinations of baseline embeddings, POS Tagging, and casing features, whereas “Base embeddings" indicates the word and character embedding features combined.
Table 10.
Summary of performances in recent and current disaster-specific NER study.
Table 11.
NER performance comparative analysis for alternative model configurations.
Table 12.
NER performance comparative analysis for alternative dataset.