Automatic Detection of Cyberbullying in Social Media Text

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a training corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for this particular task. Experiments on a holdout test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1-score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems based on keywords and word unigrams.


Introduction
Web 2.0 has had a substantial impact on communication and relationships in today's society. Children and teenagers go online more frequently, at younger ages, and in more diverse ways (e.g. smartphones, laptops and tablets). Although most of teenagers' Internet use is harmless and the benefits of digital communication are evident, the freedom and anonymity experienced online makes young people vulnerable, with cyberbullying being one of the major threats (Livingstone et al., 2010;Tokunaga, 2010;Livingstone et al., 2013).
Bullying is not a new phenomenon, and cyberbullying has manifested itself as soon as digital technologies have become primary communication tools. On the positive side, social media like blogs, social networking sites (e.g. Facebook) and instant messaging platforms (e.g. WhatsApp) make it possible to communicate with anyone and at any time. Moreover, they are a place where people engage in social interaction, offering the possibility to establish new relationships and maintain existing friendships (Gross et al., 2002;Mckenna & Bargh, 1999). On the negative side however, social media increase the risk of children being confronted with threatening situations including grooming or sexually transgressive behaviour, signals of depression and suicidal thoughts, and cyberbullying. Users are reachable 24/7 and are often able to remain anonymous if desired: this makes social media a convenient way for bullies to target their victims outside the school yard.
With regard to cyberbullying, a number of national and international initiatives have been launched over the past few years to increase children's online safety. Examples include KiVa 1 , a Finnish cyberbullying prevention programme, the 'Non au harcèlement ' campaign in France, Belgian governmental initiatives and helplines (e.g. clicksafe.be, veiligonline.be, mediawijs.be) that provide information about online safety, and so on.
In spite of these efforts, a lot of undesirable and hurtful content remains online. Tokunaga (2010) analysed a body of quantitative research on cyberbullying and observed cybervictimisation rates among teenagers between 20% and 40%. Juvonen & Gross (2008) focused on 12 to 17 year olds living in the United States and found that no less than 72% of them had encountered cyberbullying at least once within the year preceding the questionnaire. Hinduja & Patchin (2006) surveyed 9 to 26 year olds in the United States, Canada, the United Kingdom and Australia, and found that 29% of the respondents had ever been victimised online. A study among 2,000 Flemish secondary school students (age 12 to 18) revealed that 11% of them had been bullied online at least once in the six months preceding the survey . Finally, the 2014 large-scale EU Kids Online Report (Online, 2014) published that 20% of 11 to 16 year olds had been exposed to hate messages online. In addition, youngsters were 12% more likely to be exposed to cyberbullying as compared to 2010, clearly demonstrating that cyberbullying is a growing problem.
The prevalence of cybervictimisation depends on the conceptualisation used in describing cyberbullying, but also on research variables such as location and the number and age span of its participants. Nevertheless, the above-mentioned studies demonstrate that online platforms are increasingly used for bullying, which is a cause for concern given its impact. As shown by Cowie (2013); Fekkes et al. (2006); O' Moore & Kirkham (2001), cyberbullying can have a negative impact on the victim's self-esteem, academic achievement and emotional well-being. Price & Dalgleish (2010) found that self-reported effects of cyberbullying include negative effects on school grades, feelings like sadness, anger, fear, and depression and in extreme cases, cyberbullying could even lead to self-harm and suicidal thoughts.
The above studies demonstrate that cyberbullying is a serious problem the consequences of which can be dramatic. Successful early detection of cyberbullying attempts is therefore of key importance to youngsters' mental well-being. However, the amount of information on the Web makes it practically unfeasible for moderators to monitor all user-generated content manually. To tackle this problem, intelligent systems are required that process this information in a fast way and automatically signal potential threats. This way, moderators can respond quickly and prevent threatening situations from escalating. According to recent research, teenagers are generally in favour of such automatic monitoring, provided that effective follow-up strategies are formulated, and that privacy and autonomy are guaranteed ( Van Royen et al., 2014).
Parental control tools (e.g. NetNanny 2 ) already block unsuited or undesirable content and some social networks make use of keyword-based moderation tools (i.e., using lists of profane and insulting words to flag harmful content). However, such approaches typically fail to detect implicit or subtle forms of cyberbullying in which no explicit vocabulary is used. There is therefore a need for intelligent and self-learning systems that can go beyond keyword spotting and hence improve recall of cyberbullying detection.
The ultimate goal of this sort of research is to develop models which could improve manual monitoring for cyberbullying on social networks. We explore the automatic detection of textual signals of cyberbullying, in which it is approached as a complex phenomenon that can be realised in various ways (see Section 3.5 for a detailed overview). While a lot of the related research focuses on the detection of cyberbullying 'attacks', the present study takes into account a broader range of textual signals of cyberbullying, including posts written by bullies, as well as by victims and bystanders.
We propose a machine learning method to cyberbullying detection by making use of a linear SVM classifier (Chang & Lin, 2011;Cortes & Vapnik, 1995) exploiting a varied set of features. To the best of our knowledge, this is the first approach to the annotation of fine-grained text categories related to cyberbullying and the detection of signals of cyberbullying events. It is also the first elaborate research on automatic cyberbullying detection on Dutch social media. For the present experiments, we focus on an English and Dutch ASKfm 3 corpus, but the methodology adopted is language and genre independent, provided there is annotated data available.
The remainder of this paper is structured as follows: the next section presents a theoretic overview and gives an overview of the state of the art in cyberbullying detection, whereas Section 3 describes the corpus. Next, we present the experimental setup and discuss our experimental results. Finally, Section 6 concludes this paper and provides perspectives for further research.

Related Research
Cyberbullying is a widely covered topic in the realm of social sciences and psychology. A fair amount of research has been done on the definition and prevalence of the phenomenon (Hinduja & Patchin, 2012;Livingstone et al., 2010;Slonje & Smith, 2008), the identification of different forms of cyberbullying (O'Sullivan & Flanagin, 2003;Vandebosch & Van Cleemput, 2009;Willard, 2007), and its consequences (Cowie, 2013;Price & Dalgleish, 2010;Smith et al., 2008). In contrast to the efforts made in defining and measuring cyberbullying, the number of studies that focus on its annotation and automatic detection, is limited (Nadali et al., 2013). Nevertheless, some important advances have been made in the domain over the past few years.

A Definition of Cyberbullying
Many social and psychological studies have worked towards a definition of cyberbullying. A common starting point for conceptualising cyberbullying are definitions of traditional (or offline) bullying. Seminal work has been published by (Olweus, 1993;Nansel et al., 2001;Salmivalli et al., 1999;Wolak et al., 2007), who describe bullying based on three main criteria, including i) intention (i.e., a bully intends to inflict harm on the victim), ii) repetition (i.e., bullying acts take place repeatedly over time) and iii) a power imbalance between the bully and the victim (i.e., a more powerful bully attacks a less powerful victim). With respect to cyberbullying, a number of definitions are based on the above-mentioned criteria. A popular definition is that of Smith et al. (2008, p. 376) which describes cyberbullying as "an aggressive, intentional act carried out by a group or individual, using electronic forms of contact, repeatedly and over time, against a victim who cannot easily defend him or herself".
Nevertheless, some studies have underlined the differences between offline and online bullying, and have therefore questioned the relevance of the three criteria to the latter. Besides theoretical objections, a number of practical limitations have been observed. Firstly, while Olweus (1993) claims intention to be inherent to traditional bullying, this is much harder to ascertain in an online environment. Online conversations lack the signals of a face-to-face interaction like intonation, facial expressions and gestures, which makes them more ambiguous than real-life conversations. The receiver may therefore get the wrong impression that they are being offended or ridiculed (Vandebosch & Van Cleemput, 2009). Another criterion for bullying that might not hold in online situations, is the power imbalance between bully and victim. Although this can be evident in real life (e.g. the bully is larger, stronger, older than the victim), it is hard to conceptualise or measure in an online environment. It may be related to technological skills, anonymity or the inability of the victim to get away from the bullying Dooley & Cross (2009) ;Slonje & Smith (2008); Vandebosch & Van Cleemput (2008). Empowering for the bully are also inherent characteristics of the Web: once defamatory or confidential information about a person is made public through the Internet, it is hard, if not impossible, to remove.
Finally, while arguing that repetition is a criterion to distinguish cyberbullying from single acts of aggression, Olweus (1993) himself states that such a single aggressive action can be considered bullying under certain circumstances, although it is not entirely clear what these circumstances involve. Accordingly, Dooley & Cross (2009) claim that repetition in cyberbullying is problematic to operationalise, as it is unclear what the consequences are of a single derogatory message on a public page. A single act of aggression or humiliation may result in continued distress and humiliation for the victim if it is shared or liked by multiple perpetrators or read by a large audience. Slonje et al. (2013, p. 26) compare this with a 'snowball effect': one post may be repeated or distributed by other people so that it becomes out of the control of the initial bully and has larger effects than was originally intended.
Given these arguments, a number of less 'strict' definitions of cyberbullying were postulated by among others (Hinduja & Patchin, 2006;Juvonen & Gross, 2008;Tokunaga, 2010), where a power imbalance and repetition are not deemed necessary conditions for cyberbullying.
The above paragraphs demonstrate that defining cyberbullying is far from trivial, and varying prevalence rates (cf. Section 1) confirm that a univocal definition of the phenomenon is still lacking in the literature (Tokunaga, 2010). Based on existing conceptualisations, we define cyberbullying as content that is published online by an individual and that is aggressive or hurtful against a victim. Based on this definition, an annotation scheme was developed (see Van Hee, Verhoeven, et al. (2015)) to signal textual characteristics of cyberbullying, including posts from bullies, as well as reactions by victims and bystanders.

Detecting and Preventing Cyberbullying
As mentioned earlier, although research on cyberbullying detection is more limited than social studies on the phenomenon, some important advances have been made in recent years. In what follows, we present a brief overview of the most important natural language processing approaches to cyberbullying detection.
Although some studies have investigated the effectiveness of rule-based modelling (Reynolds et al., 2011), the dominant approach to cyberbullying detection involves machine learning. Most machine learning approaches are based on supervised (Dadvar, 2014;Dinakar et al., 2011;Yin et al., 2009) or semi-supervised learning (Nahar et al., 2014). The former involves the construction of a classifier based on labeled training data, whereas semi-supervised approaches rely on classifiers that are built from a training corpus containing a small set of labeled and a large set of unlabelled instances (a method that is often used to handle data sparsity). As cyberbullying detection essentially involves the distinction between bullying and non-bullying posts, the problem is generally approached as a binary classification task where the positive class is represented by instances containing (textual) cyberbullying, while the negative class includes instances containing non-cyberbullying or 'innocent' text.
A key challenge in cyberbullying research is the availability of suitable data, which is necessary to develop models that characterise cyberbullying. In recent years, only a few datasets have become publicly available for this particular task, such as the training sets provided in the context of the CAW 2.0 workshop 4 and more recently, the Twitter Bullying Traces dataset (Sui, 2015). As a result, several studies have worked with the former or have constructed their own corpus from social media websites that are prone to bullying content, such as YouTube (Dadvar, 2014;Dinakar et al., 2011), Formspring 5 (Dinakar et al., 2011), and ASKfm (Van Hee, Lefever, et al., 2015b) (the latter two are social networking sites where users can send each other questions or respond to them). Despite the bottleneck of data availability, existing approaches to cyberbullying detection have shown its potential, and the relevance of automatic text analysis techniques to ensure child safety online has been recognised (Desmet, 2014;Royen et al., 2016).
Among the first studies on cyberbullying detection are Yin et al. (2009);Reynolds et al. (2011);Dinakar et al. (2011), who explored the predictive power of n-grams (with and without tf-idf weighting), part-of-speech information (e.g. first and second pronouns), and sentiment information based on profanity lexicons for this task. Similar features were also exploited for the detection of cyberbullying events and fine-grained text categories related to cyberbullying (Van Hee, Lefever, et al., 2015b,a). More recent studies have demonstrated the added value of combining such content-based features with user-based information, such as including users' activities on a social network (i.e., the number of posts), their age, gender, location, number of friends and followers, and so on (Dadvar, 2014;Nahar et al., 2014;Al-garadi et al., 2016). Moreover, semantic features have been explored to further improve classification performance of the task. To this end, topic model information (Xu et al., 2012), as well as semantic relations between n-grams (according to a Word2Vec model (Zhao et al., 2016)) have been integrated.
As mentioned earlier, data collection remains a bottleneck in cyberbullying research. Although cyberbullying has been recognised as a serious problem (cf. Section 1), real-world examples are often hard to find in public platforms. Naturally, the vast majority of communications do not contain traces of verbal aggression or transgressive behaviour. When constructing a corpus for machine learning purposes, this results in imbalanced datasets, meaning that one class (e.g. cyberbullying posts) is much less represented in the corpus than the other (e.g. non-cyberbullying posts). To tackle this problem, several studies have adopted resampling techniques (Nahar et al., 2014;Al-garadi et al., 2016;Reynolds et al., 2011) that create synthetic minority class examples or reduce the number of negative class examples (i.e., minority class oversampling and majority class undersampling (Chawla et al., 2002)). Table 1 presents a number of recent studies on cyberbullying detection, providing insight into the state of the art in cyberbullying research and the contribution of the current research to the domain.
The studies discussed in this section have demonstrated the feasibility of automatic cyberbullying detection in social media data by making use of a varied set of features. Most of them have, however, focussed on cyberbullying 'attacks', or posts written by a bully. Moreover, it is not entirely clear if different forms of cyberbullying have been taken into account (e.g. sexual intimidation or harassment, or psychological threats), in addition to derogatory language or insults.
In the research described in this paper, cyberbullying is considered a complex phenomenon consisting of different forms of harmful behaviour online, which are described in more detail in our annotation scheme (Van Hee, Verhoeven, et al., 2015). Purposing to facilitate manual monitoring efforts on social networks, we develop a system that automatically detects signals of cyberbully- ing, including attacks from bullies, as well as victim and bystander reactions. Similarly, Xu et al. (2012) investigated bullying traces posted by different author roles (accuser, bully, reporter, victim). However, they collected tweets by using specific keywords (i.e., bully, bullied and bullying). As a result, their corpus contains many reports or testimonials of a cyberbullying incident (example 1), instead of actual signals that cyberbullying is going on. Moreover, their method implies that cyberbullying-related content devoid of such keywords will not be part of the training corpus.
1. 'Some tweens got violent on the n train, the one boy got off after blows 2 the chest... Saw him cryin as he walkd away :( bullying not cool' (Xu et al., 2012, p. 658) For this research, English and Dutch social media data were annotated for different forms of cyberbullying, based on the actors involved in a cyberbullying incident. After preliminary experiments for Dutch (Van Hee, Lefever, et al., 2015b,a), we currently explore the viability of detecting cyberbullying-related posts in Dutch and English social media. To this end, binary classification experiments are performed exploiting a rich feature set and optimised hyperparameters.

Data Collection and Annotation
To be able to build representative models for cyberbullying, a suitable dataset is required. This section describes the construction of two corpora, English and Dutch, containing social media posts that are manually annotated for cyberbullying according to our fine-grained annotation scheme. This allows us to develop a detection system covering different forms and participants (or roles) involved in a cyberbullying event.

Data Collection
Two corpora were constructed by collecting data from the social networking site ASKfm, where users can create profiles and ask or answer questions, with the option of doing so anonymously. ASKfm data typically consists of question-answer pairs published on a user's profile. The data were retrieved by crawling a number of seed profiles using the GNU Wget software 6 in April and October, 2013. After language filtering (i.e., non-English or non-Dutch content was removed), the experimental corpora comprised 113,698 and 78,387 posts for English and Dutch, respectively.

Data Annotation
Cyberbullying has been a widely covered research topic recently and studies have shed light on direct and indirect types of cyberbullying, implicit and explicit forms, verbal and non-verbal cyberbullying, and so on. This is important from a sociolinguistic point of view, but knowing what cyberbullying involves is also crucial to build models for automatic cyberbullying detection. In the following paragraphs, we present our data annotation guidelines (Van Hee, Verhoeven, et al., 2015) and focus on different types and roles related to the phenomenon.

Types of Cyberbullying
Cyberbullying research is mainly centered around the conceptualisation, occurrence and prevention of the phenomenon (Hinduja & Patchin, 2012;Livingstone et al., 2010;Slonje & Smith, 2008). Additionally, different forms of cyberbullying have been identified (O'Sullivan & Flanagin, 2003;Price & Dalgleish, 2010;Willard, 2007) and compared with forms of traditional or offline bullying (Vandebosch & Van Cleemput, 2009). Like traditional bullying, direct and indirect forms of cyberbullying have been identified. Direct cyberbullying refers to actions in which the victim is directly involved (e.g. sending a virus-infected file, excluding someone from an online group, insulting and threatening), whereas indirect cyberbullying can take place without awareness of the victim (e.g. outing or publishing confidential information, spreading gossip, creating a hate page on social networking sites) (Vandebosch & Van Cleemput, 2009).
The present annotation scheme describes some specific textual categories related to cyberbullying, including threats, insults, defensive statements from a victim, encouragements to the harasser, etc. (see Section 3.5 for a complete overview). All of these forms were inspired by social studies on cyberbullying Vandebosch & Van Cleemput, 2009) and manual inspection of cyberbullying examples.

Roles in Cyberbullying
Similarly to traditional bullying, cyberbullying involves a number of participants that adopt welldefined roles. Researchers have identified several roles in (cyber)bullying interactions. Although traditional studies on bullying have mainly concentrated on bullies and victims (Salmivalli et al., 1996), the importance of bystanders in a bullying episode has been acknowledged (Bastiaensens et al., 2014;Salmivalli, 2010). Bystanders can support the victim and mitigate the negative effects caused by the bullying (Salmivalli, 2010), especially on social networking sites, where they hold higher intentions to help the victim than in real life conversations (Bastiaensens et al., 2015). While Salmivalli et al. (1996) distinguish four different bystanders, Vandebosch et al. (2006) distinguish three main types: i) bystanders who participate in the bullying, ii) who help or support the victim and iii) those who ignore the bullying. Given that passive bystanders are hard to recognise in online text, only the former two are included in our annotation scheme.

Annotation Guidelines
To operationalise the task of automatic cyberbullying detection, we developed and tested a finegrained annotation scheme and applied it to our corpora. While a detailed overview of the guidelines is presented in our technical report (Van Hee, Verhoeven, et al., 2015), we briefly present the categories and main annotation steps below.
-Threat/Blackmail: expressions containing physical or psychological threats or indications of blackmail.
-Insult: expressions meant to hurt or offend the victim.
* General insult: general expressions containing abusive, degrading or offensive language that are meant to insult the addressee.
* Attacking relatives: insulting expressions towards relatives or friends of the victim.
* Discrimination: expressions of unjust or prejudicial treatment of the victim. Two types of discrimination are distinguished (i.e., sexism and racism). Other forms of discrimination should be categorised as general insults.
-Curse/Exclusion: expressions of a wish that some form of adversity or misfortune will befall the victim and expressions that exclude the victim from a conversation or a social group.
-Defamation: expressions that reveal confident or defamatory information about the victim to a large public.
-Sexual Talk: expressions with a sexual meaning or connotation. A distinction is made between innocent sexual talk and sexual harassment.
-Defense: expressions in support of the victim, expressed by the victim himself or by a bystander.
* Bystander defense: expressions by which a bystander shows support for the victim or discourages the harasser from continuing his actions.
* Victim defense: assertive or powerless reactions from the victim.
-Encouragement to the harasser: expressions in support of the harasser.
-Other: expressions that contain any other form of cyberbullying-related behaviour than the ones described here.
Based on the literature on role-allocation in cyberbullying episodes (Salmivalli et al., 2011;Vandebosch et al., 2006), four roles are distinguished, including victim, bully, and two types of bystanders.
1. Harasser or Bully: person who initiates the bullying.

Victim: person who is harassed.
3. Bystander-defender: person who helps the victim and discourages the harasser from continuing his actions.
4. Bystander-assistant: person who does not initiate, but helps or encourages the harasser.
Essentially, the annotation scheme describes two levels of annotation. Firstly, the annotators were asked to indicate, at the post level, whether the post under investigation was related to cyberbullying. If the post was considered a signal of cyberbullying, annotators identified the author's role. Secondly, at the subsentence level, the annotators were tasked with the identification of a number of finegrained text categories related to cyberbullying. More concretely, they identified all text spans corresponding to one of the categories described in the annotation scheme. To provide the annotators with some context, all posts were presented within their original conversation when possible. All annotations were done using the Brat rapid annotation tool (Stenetorp et al., 2012), some examples of which are presented in Table 2.

Annotation
Annotation

Annotation Statistics
The English and Dutch corpora were independently annotated for cyberbullying by trained linguists. All were Dutch native speakers and English second-language speakers. To demonstrate the validity of our guidelines, inter-annotator agreement scores were calculated using Kappa on a subset of each corpus. Inter-rater agreement for Dutch (2 raters) is calculated using Cohen's Kappa (Cohen, 1960). Fleiss' Kappa (Fleiss, 1971) is used for the English corpus (> 2 raters). Kappa scores for the identification of cyberbullying are κ= 0.69 (Dutch) and κ= 0.59 (English). As shown in Table 3, inter-annotator agreement for the identification of the more fine-grained categories for English varies from fair to substantial (McHugh, 2012), except for defamation, which appears to be more difficult to recognise. No encouragements to the harasser were present in this subset of the corpus. For Dutch, the inter-annotator agreement is fair to substantial, except for curse and defamation. Analysis revealed that one of both annotators often annotated the latter as an insult, and in some cases even did not consider it as cyberbullying-related.  Table 3. Inter-annotator agreement on the fine-grained categories related to cyberbullying.
In short, the inter-rater reliability study shows that the annotation of cyberbullying is not trivial and that more fine-grained categories like defamation, curse and encouragements are sometimes hard to recognise. It appears that defamations were sometimes hard to distinguish from insults, whereas curses and exclusions were sometimes considered insults or threats. The analysis further reveals that encouragements to the harasser are subject to interpretation. Some are straightforward (e.g. 'I agree we should send her hate'), whereas others are subject to the annotator's judgement and interpretation (e.g. 'hahaha', 'LOL').

Experimental Setup
In this paper, we explore the feasibility of automatically recognising signals of cyberbullying. A crucial difference with state-of-the-art approaches to cyberbullying detection is that we aim to model bullying attacks, as well as reactions from victims and bystanders (i.e., all under one binary label 'signals of cyberbullying'), since these could likewise indicate that cyberbullying is going on. The experiments described in this paper focus on the detection of such posts, which are signals of a potential cyberbullying event to be further investigated by human moderators.
The English and Dutch corpus contain 113,698 and 78,387 posts, respectively. As shown in Table 4, the experimental corpus features a heavily imbalanced class distribution with the large majority of posts not being part of cyberbullying. In classification, this class imbalance can lead to decreased performance. We apply cost-sensitive SVM as a possible hyperparameter in optimisation to counter this. The cost-sensitive SVM reweighs the penalty parameter C of the error term by the inverse class-ratio. This means that misclassifications of the minority positive class are penalised more than classification errors on the majority negative class. Other pre-processing methods to handle data imbalance in classification include feature filtering metrics and data resampling (He & Garcia, 2009). These methods were omitted as they were found to be too computationally expensive given our high-dimensional dataset.
For the automatic detection of cyberbullying, we performed binary classification experiments using a linear kernel support vector machine (SVM) implemented in LIBLINEAR (Fan et al., 2008) by making use of Scikit-learn (Pedregosa et al., 2011), a machine learning library for Python. The motivation behind this is twofold: i) support vector machines (SVMs) have proven to work well for tasks similar to the ones under investigation (Desmet, 2014)

Hyperparameter Values
Penalty of error term C 1e {−3,−2,...,2,3} Loss function Hinge, squared hinge Penalty: norm used in penalisation 'l1' ('least absolute deviations') or 'l2' ('least squares') Class weight (sets penalty C of class i to weight*C) None or 'balanced', i.e., weight inversely proportional to class frequencies training of large-scale data which allow for a linear mapping (which was confirmed after a series of preliminary experiments using LIBSVM with linear, RBF and polynomial kernels).
The classifier was optimised for feature type (cf. Section 4.1) and hyperparameter combinations (cf. Table 5). Model selection was done using 10-fold cross validation in grid search over all possible feature types (i.e., groups of similar features, like different orders of n-gram bag-of-words features) and hyperparameter configurations. The best performing hyperparameters are selected by F 1 -score on the positive class. The winning model is then retrained on all held-in data and subsequently tested on a hold-out test set to assess whether the classifier is over-or under-fitting. The holdout represents a random sample (10%) of all data. The folds were randomly stratified splits over the hold-in class distribution. Testing all feature type combinations is a rudimentary form of feature selection and provides insight into which types of features work best for this particular task.
Feature selection over all individual features was not performed because of the large feature space (NL: 795,072 and EN: 871,296 individual features). Hoste (2005), among other researchers, demonstrated the importance of joint optimisation, where feature selection and hyperparameter optimisation are performed simultaneously, since the techniques mutually influence each other.
The optimised models are evaluated against two baseline systems: i) an unoptimised linearkernel SVM (configured with default parameter settings) based on word n-grams only and, ii) a keyword-based system that marks posts as positive for cyberbullying if they contain a word from existing vocabulary lists composed by aggressive language and profanity terms.

Pre-processing and Feature Engineering
As pre-processing, we applied tokenisation, PoS-tagging and lemmatisation to the data using the LeTs Preprocess Toolkit (van de Kauter et al., 2013). In supervised learning, a machine learning algorithm takes a set of training instances (of which the label is known) and seeks to build a model that generates a desired prediction for an unseen instance. To enable the model construction, all instances are represented as a vector of features (i.e., inherent characteristics of the data) that contain information that is potentially useful to distinguish cyberbullying from non-cyberbullying content.
We experimentally tested whether cyberbullying events can be recognised automatically by lexical markers in a post. To this end, all posts were represented by a number of information sources (or features) including lexical features like bags-of-words, sentiment lexicon features and topic model features, which are described in more detail below. Prior to feature extraction, some data cleaning steps were executed, such as the replacement of hyperlinks and @-replies, removal of superfluous white spaces, and the replacement of abbreviations by their full form (based on an existing mapping dictionary 7 ). Additionally, tokenisation was applied before n-gram extraction and sentiment lexicon matching, and stemming was applied prior to extracting topic model features.
After pre-processing of the corpus, the following feature types were extracted: • Word n-gram bag-of-words: binary features indicating the presence of word unigrams, bigrams and trigrams.
• Character n-gram bag-of-words: binary features indicating the presence of character bigrams, trigrams and fourgrams (without crossing word boundaries). Character n-grams provide some abstraction from the word level and provide robustness to the spelling variation that characterises social media data.
• Term lists: one binary feature derived for each one out of six lists, indicating the presence of an item from the list in a post: proper names, 'allness' indicators (e.g. always, everybody), diminishers (e.g. slightly, relatively), intensifiers (e.g. absolutely, amazingly), negation words and aggressive language and profanity words. Person alternation is a binary feature indicating whether the combination of a first and second person pronoun occurs in order to capture interpersonal intent.
• Subjectivity lexicon features: positive and negative opinion word ratios, as well as the overall post polarity were calculated using existing sentiment lexicons. For Dutch, we made use of the Duoman (Jijkoun & Hofmann, 2009) and Pattern (De Smedt & Daelemans, 2012) lexicons. For English, we included the Hu and Liu opinion lexicon (Hu & Liu, 2004), the MPQA lexicon (Wilson et al., 2005), General Inquirer Sentiment Lexicon (Stone et al., 1966), AFINN (Nielsen, 2011), and MSOL (Mohammad et al., 2009). For both languages, we included the relative frequency of all 68 psychometric categories in the Linguistic Inquiry and Word Count (LIWC) dictionary for English (Pennebaker et al., 2001) and Dutch (Zijlstra et al., 2004).
When applied to the training data, this resulted in 871, 296 and 795, 072 features for English and Dutch, respectively.

Results
In this section, we present the results of our experiments on the automatic detection of cyberbullyingrelated posts in an English (EN) and Dutch (NL) corpus of ASKfm posts. Ten-fold cross-validation was performed in exhaustive grid-search over different feature type and hyperparameter combinations (see Section 4). The unoptimised word n-gram-based classifier and keyword-matching system serve as baselines for comparison. Precision, Recall and F 1 performance metrics were calculated on the positive class (i.e., 'binary averaging'). We also report Area Under the ROC curve (AUC) scores, a performance metric that is more robust to data imbalance than precision, recall and micro-averaged F-score (Fawcett, 2006 A word n-grams B subjectivity lexicons C character n-grams D term lists E topic models  Table 6 gives us an indication of which feature type combinations score best and hence contribute most to this task. A total of 31 feature type combinations, each with 28 different hyperparameter sets have been tested. Table 6 shows the results for the three best scoring systems by included feature types with optimised hyperparameters. The maximum attained F 1 -score in cross-validation is 64.26% for English and 61.20% for Dutch and shows that the classifier benefits from a variety of feature types. The results on the holdout test set show that the trained systems generalise well on unseen data, indicating little under-or overfitting. The simple keyword-matching baseline system has the lowest performance for both languages even though it obtains high recall for English, suggesting that profane language characterises many cyberbullying-related posts. Feature group and hyperparameter optimisation provides a considerable performance increase over the unoptimised word n-gram baseline system. The top-scoring systems for each language do not differ a lot in performance, except the best system for Dutch, which trades recall for precision when compared to the runner-ups. Table 8 presents the scores of the (hyperparameter-optimised) single feature type systems, to gain insight into the performance of these feature types when used individually. Analysis of the combined and single feature type sets reveals that word n-grams, character n-grams, and subjectivity lexicons prove to be strong features for this task. In effect, adding character n-grams always improved classification performance for both languages. They likely provide robustness to lexical variation in social media text, as compared to word n-grams. While subjectivity lexicons appear to be discriminative features, term lists perform badly on their own as well as in combinations for both languages. This shows once again (cf. profanity baseline) that cyberbullying detection requires more sophisticated information sources than profanity lists. Topic models seem to do badly for both languages on their own, but in combination, they improve Dutch performance consistently. A possible explanation for their varying performance in both languages would be that the topic models trained on the Dutch background corpus are of better quality than the English ones. In effect, a random selection of background corpus texts reveals that the English scrape contains more noisy data (i.e., low word-count posts and non-English posts) than the Dutch data. A shallow qualitative analysis of the classification output provided insight into some of the classification mistakes. Table 9 gives an overview of the error rates per cyberbullying category of the best performing and baseline systems. This could give an indication of which types of bullying the current system has trouble classifying. All categories are always considered positive for cyberbullying (i.e., the error rate equals the false negative rate), except for Sexual and Insult which can also be negative (in case of harmless sexual talk and 'socially acceptable' insulting language like 'hi bitches, in for a movie?' the corresponding category was indicated, but the post itself was not annotated as cyberbullying) and Not cyberbullying, which is always negative. Error rates often being lowest for the profanity baseline confirms that it performs particularly well in terms of recall (at the expense of precision, see Table 8) When looking at the best system for both languages, we see that Defense is the hardest category to correctly classify. This should not be a surprise as the category comprises defensive posts from bystanders and victims, which contain less aggressive language than cyberbullying attacks and are often shorter in length than the latter. Assertive defensive posts (i.e., a subcategory of Defense) that attack the bully) are, however, more often correctly classified. There are not enough instances of Encouragement for either language in the holdout to be representative. In both languages, threats, curses and incidences of sexual harassment are most easily recognisable, showing (far) lower error rates than the categories Defamation, Defense, Encouragements to the harasser, and Insult.
Qualitative error analysis of the English and Dutch predictions reveals that false positives often contain aggressive language directed at a second person, often denoting personal flaws or containing sexual and profanity words. We see that misclassifications are often short posts containing just a few words and that false negatives often lack explicit verbal signs of cyberbullying (e.g. insulting or profane words) or are ironic (examples 2 and 3). Additionally, we see that cyberbullying posts containing misspellings or grammatical errors and incomplete words are also hard to recognise as such (examples 4 and 5). The Dutch and English data are overall similar with respect to qualitative properties of classification errors.  Table 9. Error rates (%) per cyberbullying category on holdout for English and Dutch systems.
In short, the experiments show that our classifier clearly outperforms both a keyword-based and word n-gram baseline. However, analysis of the classifier output reveals that false negatives often lack explicit clues that cyberbullying is going on, indicating that our system might benefit from irony recognition and integrating world knowledge to capture such implicit realisations of cyberbullying.
Given that we present the first elaborate research on detecting signals of cyberbullying regardless of the author role instead of bully posts alone, crude comparison with the state of the art would be irrelevant. We observe, however, that our classifier obtains competitive results compared to Dadvar

Conclusions and Future Research
The goal of the current research was to investigate the automatic detection of cyberbullying-related posts on social media. Given the information overload on the web, manual monitoring for cyberbullying has become unfeasible. Automatic detection of signals of cyberbullying would enhance moderation and allow to respond quickly when necessary.
Cyberbullying research has often focused on detecting cyberbullying 'attacks', hence overlooking posts written by victims and bystanders. However, these posts could just as well indicate that cyberbullying is going on. The main contribution of this paper is that it presents a system for detecting signals of cyberbullying on social media, including posts from bullies, victims and bystanders. A manually annotated cyberbullying dataset was created for two languages, which will be made available for public scientific use. Moreover, while a fair amount of research has been done on cyberbullying detection for English, we believe this is one of the first papers that focus on Dutch as well.
A set of binary classification experiments were conducted to explore the feasibility of automatic cyberbullying detection on social media. In addition, we sought to determine which information sources contribute to this task. Two classifiers were trained on English and Dutch ASKfm data and evaluated on a holdout test of the same genre. Our experiments reveal that the current approach is a promising strategy for detecting signals of cyberbullying in social media data automatically. After feature selection and hyperparameter optimisation, the classifiers achieved an F 1 -score of 64.32% and 58.72% for English and Dutch, respectively. The systems hereby significantly outperformed a keyword and an (unoptimised) n-gram baseline. Analysis of the results revealed that false positives often include implicit cyberbullying or offenses through irony, the challenge of which will constitute an important area for future work.
Another interesting direction for future work would be the detection of fine-grained cyberbullyingrelated categories such as threats, curses and expressions of racism and hate. When applied in a cascaded model, the system could find severe cases of cyberbullying with high precision. This would be particularly interesting for monitoring purposes, since it would allow to prioritise signals of bullying that are in urgent need for manual inspection and follow-up.
Finally, future work will focus on the detection of participants (or roles) typically involved in cyberbullying. This would allow to analyse the context of a cyberbullying incident and hence evaluate its severity. When applied as moderation support on online platforms, such a system would allow to provide feedback in function of the recipient (i.e., a bully, victim, or bystander).

Acknowledgment
The work presented in this paper was carried out in the framework of the AMiCA IWT SBO-project 120007 project, funded by the government Flanders Innovation & Entrepreneurship (VLAIO) agency.