Analyzing and learning the language for different types of harassment

doi:10.1371/journal.pone.0227330

Fig 1.

Five contextual types of harassment.

More »

Expand

Table 1.

Summery of the related research.

More »

Expand

Table 2.

Annotation statistics of our categorized corpus.

More »

Expand

Table 3.

Agreement rate.

More »

Expand

Table 4.

Statistics for the Golbeck corpus after our annotation wrt. contextual type.

More »

Expand

Fig 2.

Significant LIWC features in comparing harassing corpus to non-harassing corpus for six categories.

The extreme red (green) color indicates the significance of a given feature in the harassing corpus (non-harassing corpus). E.g. the negation feature with the value 2.34 in the appearance harassing corpus is significantly higher than non-harassing corpus. The white color indicates a lack of difference for a given feature when comparing two corpora.

More »

Expand

Fig 3.

Top-25 frequent words within each harassing corpora.

More »

Expand

Fig 4.

Top-25 frequent words within each non-harassing corpora.

More »

Expand

Table 5.

Percentage of type-dependent of top-15 frequent words within each sub-corpus.

H stands for the harassing corpus and NH stands for the non-harassing corpus.

More »

Expand

Table 6.

Size of the training datasets for each type.

More »

Expand

Fig 5.

Comparative study of the F-score from four major classifiers i.e., SVM stands for support vector machine, KNN = K-Nearest Neighbor, GBM = Gradient Boosting Machine, NB = Naive Bayes, NN = Nueral Network).

More »

Expand

Fig 6.

Comparative study of the various feature settings on the performance of the GBM classifier using measures such as precision, recall, F-score, accuracy, and specificity.

The extreme colors, i.e., purple, yellow, green, olive, and pink show the higher values versus the white color that shows a lower value.

More »

Expand

Table 7.

Performance of the GBM binary classifier on the combined corpus.

More »

Expand

Table 8.

Performance of our multi-class classifier for predicting type of harassment incident.

More »

Expand

Table 9.

Performance of our classifier for predicting tweets for Golbeck corpus.

More »

Expand