Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Guide for the application of the data augmentation approach on sets of texts in Spanish for sentiment and emotion analysis

  • Rodrigo Gutiérrez Benítez ,

    Contributed equally to this work with: Rodrigo Gutiérrez Benítez, Alejandra Segura Navarrete, Christian Vidal-Castro, Claudia Martínez-Araneda

    Roles Investigation, Software, Visualization, Writing – original draft

    Affiliation Information Systems Department, Universidad del Bio-Bío, Concepción, Bio-Bío, Chile

  • Alejandra Segura Navarrete ,

    Contributed equally to this work with: Rodrigo Gutiérrez Benítez, Alejandra Segura Navarrete, Christian Vidal-Castro, Claudia Martínez-Araneda

    Roles Conceptualization, Methodology, Project administration, Supervision, Validation

    Affiliation Information Systems Department, Universidad del Bio-Bío, Concepción, Bio-Bío, Chile

  • Christian Vidal-Castro ,

    Contributed equally to this work with: Rodrigo Gutiérrez Benítez, Alejandra Segura Navarrete, Christian Vidal-Castro, Claudia Martínez-Araneda

    Roles Conceptualization, Supervision, Validation

    Affiliation Information Systems Department, Universidad del Bio-Bío, Concepción, Bio-Bío, Chile

  • Claudia Martínez-Araneda

    Contributed equally to this work with: Rodrigo Gutiérrez Benítez, Alejandra Segura Navarrete, Christian Vidal-Castro, Claudia Martínez-Araneda

    Roles Data curation, Formal analysis, Validation, Writing – review & editing

    cmartinez@ucsc.cl

    Affiliation Computer Science Department, Universidad Católica de la Santísima Concepción, Concepción, Bio-Bío, Chile

Abstract

Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

Introduction

The explosive increase in the use of social media as a means of mass communication in the last decade [1, 2], has opened new research avenues for Natural Language Processing (NLP). Among these are the classification of texts with emotional intent (EI) and the identification of text polarity (SA). The approaches used in these NLP tasks include those based on Machine Learning (ML) such as Naïve Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) [10] and the evolution towards those based on Deep Learning (DL) [3], which are data-driven approaches with high computational complexity [3]. However, to achieve their purpose, these models require substantial amounts of labeled data [4, 5], which poses a problem, as the task of manually labeling texts is time-consuming and resource-intensive [1, 68]. Therefore, alternatives are sought that allow for the acquisition of labeled data quickly, efficiently, and, as much as possible, without the need for human intervention. In this sense, data augmentation techniques, which were initially used in image analysis, are now used in text analysis to increase the sets of labeled data and thus improve the performance of classification models [9, 10].

This work is framed within the analysis of text subjectivity in Spanish, to analyze the effect on the performance of Machine Learning (ML) and Deep Learning (DL) classification models when data augmentation techniques are used. To achieve this objective, after reviewing the state of the art of text augmentation, the most used data augmentation techniques for texts will be applied to different datasets created by the SoMos (SOftware-MOdelling-Science) research group of the Universidad del Bío-Bío to evaluate the impact of augmentation on the classification of sentiments and emotions in the Spanish language with the most common ML and DL algorithms such as Support Vector Machine (SVM), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Bidirectional Encoder Representations from Transformers pre-trained on Spanish Corpus (BETO). With these results, a guide is proposed for the selection of text augmentation techniques for sentiment analysis and emotion analysis tasks. The hypothesis that guides our work is described below “It is possible to improve the performance of classifiers based on Machine and Deep learning models for sentiment and emotions analysis in Spanish texts by employing data augmentation techniques.

The remainder of this work is structured as follows: The Related Works section provides a review of the literature on various taxonomies and data augmentation techniques. The Method section outlines the working hypotheses and a methodological approach for validation. The Experiments section details the phase of description and preparation of the datasets for continuing with the experimentation, considering the results derived from a baseline involving non-augmented data versus data augmented with the selected techniques. The Results section presents the overall results of the experimentation with the various data augmentation techniques on the datasets, also proposing a guide for selecting an augmentation technique and classification model based on the characteristics of a dataset. The final sections correspond to the discussion of the results, conclusions and future directions.

Related works

Data augmentation (DA) is a set of methods used to generate new data from a set of labeled data [1114]. Initially, it was used to augment data in image processing [15], but over time, data augmentation techniques for text have been developed to improve the classification performance of ML and DL models. Text augmentation is used to address the issues of scarcity of labeled data and imbalance in the classes of a dataset [2], by generating new sentences from the existing ones in the dataset. This section will delve into the various text data augmentation techniques. Two taxonomies were reviewed that allow categorizing them. The first taxonomy, presented by Bayer et al. [16], defines two major categories: the Data Space, which groups techniques that perform augmentation directly on the data at the level of characters, words, phrases, and documents; and the Feature Space, usually represented through embeddings, which groups the techniques that perform augmentation through the manipulation of the vector representation of the data. This work focuses on augmentation methods based on Data Space. The second taxonomy reviewed is the one presented in the work of Queiroz et al. [17]. In it, a more comprehensive classification of augmentation techniques is shown by classifying them according to the type of augmentation performed on the dataset. Fig 1 offers a detailed view of this taxonomy. Under this last taxonomy, our work also shows the application of transformation, paraphrasing, and generation techniques.

thumbnail
Fig 1. The taxonomy of data augmentation.

Based on Queiroz et al. [17].

https://doi.org/10.1371/journal.pone.0310707.g001

Data augmentation through sentence transformation

This variant mainly relies on operations of replacement, insertion, swapping, and deletion of words at random [18]. In the case of word replacement and insertion, the choice of these can be performed using dictionaries such as WordNet, word embeddings (WE) with Word2Vec [11], and the identification of the importance of words in a dataset using techniques like TF-IDF to preserve the original sentence. In Table 1, examples of augmentation through replacement, insertion, swapping, and deletion of words within a sentence are shown.

thumbnail
Table 1. Examples of sentence transformation, Balakrishnan et al.

[11].

https://doi.org/10.1371/journal.pone.0310707.t001

In this work, a dataset is augmented using EDA (Easy Data Augmentation) [18], followed by a comparison between ML and DL classifiers. The reported results show that the DL algorithms used (CNN, RNN, BiLSTM, and BERT variants) perform better than ML algorithms (LR, NB, DT, RF, SVM) for both the original and augmented datasets. The performance of the DL algorithms with augmented data was 96% accuracy and 91.1% F1-score.

Furthermore, Chen et al. [9] propose a variation of EDA that considers the information contained in emojis in tweets to preserve their semantics, and then perform SA using BiLSTM. The results show that the performance of the original dataset with the classifier is poor due to the low amount of labeled data it has; they show that with augmentation using EDA and emojis, their results improved significantly. Their accuracy and F1-score metrics are around 72%. Yuan et al. [19] propose a new adaptation of EDA, used to augment a dataset with 4 affective categories (joy, anger, bored, sad), to address the problem of semantic loss, the authors propose the use of TF-IDF to find the most important words in context. Then, they subject the augmented dataset to CNN classifiers, the results of their experimentation show that compared to their baselines, the proposed model improves the accuracy metric by 3%. Another work that focuses on the use of EDA is Lee et al. [20], in this work, they use Knowledge Graphs for the representation of sentences and the modification of them is based on the techniques used by EDA.

Dhiman et al. [12] use MOD-EDA to augment a dataset of tweets concerning the feelings of users regarding India’s public policies and the influence they have on election periods, then using BERT they perform the classification of the polarity of the tweets. To measure the performance of their model, they make combinations between the classifier with and without augmentation, obtaining the best results with MOD-EDA + BERT with percentages of 70% accuracy and 71% F1-score.

To improve classification results using BERT in conjunction with the English lexicon, Tahayna et al. [1] perform the replacement of idioms in English according to their meaning, thus obtaining new sentences. Using the F1-score as a metric, they report that their model proved an enhancement in ranking by over 10% compared to the baseline (76.98%) across a dataset of 150 tweets. Meanwhile, Li et al. [10] propose their algorithms for synonym replacement augmentation (PWSS) and word order exchange within a sentence (DRAWS), thus augmenting four public datasets concerning the SA task, as classifiers for their work using LSTM. The reported results show that the increase through DRAWS (11.49% Macro-F1) has better performance than PWSS (2.9% Macro-F1) with the dataset used. To perform SA on tweets in the Turkish language, Shehu et al. [17, 21] proposes three methods of augmentation, shift, shuffle, and a combination of both, and then subjecting the augmented dataset to ML and DL classifiers. Unfortunately, the authors do not delve into augmentation techniques, because their focus is their proposed classification model (HAN). Their results show that the use of DL algorithms obtains better classification results than ML algorithms. However, they show that, in terms of execution times and training, ML behaves better.

Liu et al. [22] use BiLSTM as a sentiment classification model in emails and the random replacement of words by combining WordNet and K-nearest Neighbor (KNN) to perform data augmentation of the dataset to balance polarity classes. Something interesting to highlight is that by replacing 20% of the words in a sentence and applying Linear Discriminant Analysis (LDA) and TF-IDF [14, 23] to maintain semantics, better augmentation results are obtained. The reported results show that with its augmentation model and the use of BiLSTM as a classifier, it improves accuracy between 1.5% and 10%. One of the studies that did not obtain significant improvements with the augmentation of the dataset for the classification of aspect categories is that of Almasre [24] with results of F1-score 66.3% for baseline and 66.1% on the same metric for the augmented dataset. The augmentation method used is based on replacing no more than 25% of random words in a sentence and employing cosine similarity to find the words to be used for replacement. As a classifier, they use a variant of BERT for the Arabic language. One of the problems they faced in their model was the considerable number of aspect categories that needed to be augmented (34) and the imbalance in 11 of them.

Another example of the application of data augmentation to solve the imbalance problem of a dataset is the one proposed by Ha et al. [25], their augmentation model uses synonym replacement based on the Paraphrase Database to then use CNN to detect three categories (support, other, and oppose) in a dataset relating to opinions made by U.S. citizens concerning to a clean power plan. After augmentation, their dataset was balanced at 2800 comments per category. The results were accuracy and F1-score 84% and 71%, respectively. Continuing with the use of augmentation for dataset balancing, Lee et al. [20] compares two augmentation techniques, EDA and Unsupervised Data Augmentation (UDA), to balance samples in Knowledge Graphs with polarity charge. The experimentation with EDA in the negative category showed minimal impact on ranking performance when compared to the outcomes achieved with UDA in the positive category for the identical dataset.

To perform the classification of SA in Chinese texts, Wang et al. [26] augment a dataset by replacing synonyms extracted from a thesaurus that they build for their model, showing that the problem of replacing synonyms with a low level of similarity is corrected and that it influences the classification task. Once the dataset is augmented, they perform SA using a hybrid model between CharCNN (which extracts the features) and SVM (which performs the classification). The results reported in their work show that with their augmentation and classification model, they obtain an accuracy of 95%.

To perform sentiment analysis on tweets that have idioms, Tahayna et al. [27] use SliDE (IBM lexicon) and a BERT classifier. In their work, they propose a method of increasing tweets by replacing idioms with their meaning, with which they report a ranking performance of 92% and a 16% reduction in classification error from the dataset without augmentation.

By using text augmentation by transformation techniques, Qudar et al. [6] replace 5% of words with synonyms, remove 10% of words, insert 5%, and swap 5% of the words in a sentence. To then apply a semi-supervised student-teacher model with which they carry out sentiment analysis. The results obtained by their model are 87.3% F1-score for the dataset SemEval Aspect Sentiment Analysis and 88.35% F1-score for a dataset of Twitter.

To improve the detection of sarcasm in short Arabic texts, Al-Jamal et al. [28] use two techniques for in-text data augmentation, Random Swap, and Random Deletion to balance the classes of your iSarcasmEval dataset [29]. To then detect sarcasm using BERT. However, their results were less than 60% on this task for the metric F1-score. They show that more robust labeled datasets need to be built to improve the detection of sarcastic comments.

A different approach to augmentation by transformation is introduced by Kraus et al. [30], where they perform Random Swap and Random Insertion dataset represented as a Rhetorical Structure Tree. For sentiment analysis, they use a DL classifier called Discourse-LSTM with which they get a performance improvement with the Rotten Tomatoes dataset of 4.27% in F1-score.

Kelsingazin et al. [31] propose two implementations of algorithms for the augmentation of data in text based on Random Insertion and Random Deletion. However, they do not provide details of the implementation of these. As for the classifiers used, they show that they use SVM and LR, but do not provide details of the implementation or the parameters used. Regarding the results, they show an improvement in ranking performance of 1% in the F1-score.

Iosifidis et al. [2] propose two augmentation techniques. The first of these performs word replacement in a statement by selecting the most similar ones according to the cosine similarity calculated from WE. The second removes words from the sentence except those that have sentimental weight and those that correspond to negation, to preserve the class of the original sentence. The authors propose the use of these techniques to increase the training time of their model to correct the imbalance of the classes at the time of training. Throughout their experimentation, they show that the replacement method based on WE do not behave as well as the method of word elimination, in this situation, they suspect that at the time of making the replacements the classes of some of the samples are changed (i.e., the meaning of the sentence is changed).

Duwairi et al. [13] report the most significant enhancement in performance, with a 42% improvement in accuracy after augmenting a dataset with product reviews in Arabic. The proposed augmentation method is based on the replacement of synonyms from Arabic-Wordnet and the application of syntactic rules specific to the Arabic language to generate new statements. In their experimentation, they evaluate the augmented datasets on three ML classifiers (NB, KNN, SVM) and increase the dataset up to ten times, achieving an increase in classification performance of more than 40% compared to the unaugmented dataset.

To keep the semantics of augmented statements, Feng et al. [32] propose two algorithms, the first of which performs probabilistic synonym replacement by selecting words from a lexicon. The second extracts the weight of the words in the context by using TF-IDF and replaces the one with the least weight concerning the context. For the classification, they use a CNN-based model, with which they achieve a 5% improvement in the classification concerning the baselines defined in the metric accuracy.

Santoso et al. [14] propose an extension of EDA with which they achieve an improvement of between 0.6% and 3.4% in accuracy concerning an unaugmented dataset. The improvement to EDA consists of the proposal of two algorithms for augmentation, the first by substitution while keeping the semantic information of the statement and the second by performing disambiguation using the Adapted Lesk algorithm. In their experimentation, they had problems with the neutral class because their model could not find the most important words in sentences with that class.

The work presented by Haralabopoulus et al. [7] encompasses the classification of emotions and polarity by LSTM and augmentation by antonym replacement, negation insertion, and permutation. Their augmentation model, unlike the others, tries to augment the dataset by changing the class in the augmented statement from the original statement. With this approach, they achieve a 4.1% performance improvement in accuracy over baselines. One of the findings in this work is that they do not bother to keep the semantics of the sentences, which goes in a completely different direction from the other articles.

Wei et al. [33] augment the dataset using synonym replacement, randomization, randomization, and random switching techniques, and then train a learning transfer model with two BERT instances, one as a student and one as a teacher. The results of their model show that they keep BERT performance accuracy for the SST2, YELP, and Amazon datasets. Recent editions of the Iberian Languages Evaluation Forum (IberLEF) have highlighted studies introducing data augmentation strategies for Spanish datasets using either BERT models [34] and Large Language Models (LLMs) [35]. These studies have shown promising results, in the first case using augmentation technique via Bayesian optimization (BO-TextAutoAugment) [36] and in the second using back-translation paraphrasing.

Despite the availability of Spanish and multilingual datasets involving sentiment analysis and emotion detection like those mentioned in Navas-Loro and Rodríguez-Doncel [37], there are a limited number of studies [5, 23, 34, 35] where data augmentation techniques are applied directly to the data set whose source language is Spanish without translation, as observed through the Related works section. Another challenge is related to the techniques used for augmentation and how they handle the preservation of the semantics of the original sample. To solve this problem, the use of the Term Frequency-Inverse Document Frequency (TF-IDF), Probabilistic Latent Semantic Analysis (pLSA), and word embeddings (WE), among others, is proposed in [38, 39].

Data augmentation through paraphrasing

This strategy uses techniques such as back-translation (BT), which consists of translating sentences into one or more intermediate languages and then translating them back into the original language, as can be seen in Fig 2. By performing this operation, sentences are obtained with slight modifications produced by the effect of translation, but which keep the semantics of the original sentence. However, the quality of sentences generated using this method depends directly on the translation tools used [8, 17, 40, 41].

thumbnail
Fig 2. Example of back-translation based on Jacob and Shushma [8].

https://doi.org/10.1371/journal.pone.0310707.g002

Among the works related to paraphrasing is the work of Krishnan et al. [41] which performs augmentation by translating sentences from English to Hindi, to increase the number of sentences that are used for its classification model, which consists of a teacher-student model using mBERT and XLM-R. The authors do not delve into the augmentation technique, only indicating that any translation or transliteration tool can be used. The results of their model in the test dataset manage to preserve or improve their performance concerning the baseline, obtaining for mBERT 61.35% and 66.24% for the Hindi and Malay languages in the F1-score metric and 62.23% and 76.46% for the same languages using XLM-R.

On the other hand, Tang et al. [40] use back-translation augmentation by translating texts from Chinese to English using the Baidu API to increase the number of samples in the training dataset consisting of Chinese micro-blogging texts, with English, Chinese, and Japanese texts belonging to NLPCC 2018 shared Task 1. This is part of its BERT-MSAUC model that classifies the emotions of happiness, sadness, fear, anger, and surprise. The results obtained by their model in the F1-score metric outperformed in two (fear and anger) of the five emotive classes mentioned above a BERT(M) model used as a baseline. The ranking performance for each class is shown in Table 2.

To complete the reviewing of articles that use this technique, Bogoradnikova et al. [23] perform sentiment analysis, toxic comment detection, and toxic text part detection using the Russian language. In their research, they compare an SVM model with the Perspective API (API used for content moderation), in the first instance the performance of the SVM model is an accuracy of 61.83% after augmentation of the dataset. Then, they use EDA and back-translation to augment the dataset but do not give details of the augmentation process. After augmentation, the results with their SVM model in the sentiment analysis task improved their performance by 10% reaching 95% ranking performance in the accuracy metric.

Data augmentation through generation

In this section, we will review the Generative Adversarial Networks (GANs) techniques, and the one based on vector space manipulation.

Generative Adversarial Networks (GANs).

GANs use generation models that create synthetic data from an existing dataset. In this sense, GANs base their operation on the use of a generator that creates sentences from an existing dataset and a discriminator, which judges whether the sentences created by the generator are real or false. When the discriminator cannot differentiate whether a sentence is real or false, we are in the presence of a sample to be added to the augmented dataset. This type of generative model commonly uses DL models for both the generator and discriminator implementation, so they rely on the training dataset to generate excellent-quality sentences [4245].

Augmentation by vector space manipulation

This technique, unlike sentence manipulation methods that work directly on text, vector space manipulation (embedding) works at the level of the representative vectors of the sentences in the vector space of the model used. Given this mode of operation, they are very dependent on the model to be used because the architecture of the model assumes how the sentences will be represented. Its implementation is based on neural networks and there is less research on it [17].

Based on the literature review on augmentation techniques, the overview shown in Table 3 is presented, detailing the Data Augmentation techniques, the number of articles, and their identification.

thumbnail
Table 3. Distribution of articles by augmentation method.

https://doi.org/10.1371/journal.pone.0310707.t003

From the table above, it can be deduced that the most used category of augmentation techniques is sentence transformation with 57.4%, followed by the category of sentence generation with 16.6%. Thirdly, you can find the category of sentence paraphrasing with 11.1%. Concerning the language, English corresponds to the most used with 57% distantly followed by Spanish with 7.1%. Regarding the classifiers that were used in the reviewed works, the most used classification model is BERT and its variants with 23%, followed by LSTM with 15%, showing a tendency towards DL classification models.

Method

To confirm the hypothesis of the work, the following activities were considered for the work method (Fig 3). The following subtasks are included within the experimental phase:

  1. Selection of datasets for Spanish text analysis that will be augmented.
  2. Selection of augmentation techniques based on the state of the art obtained.
  3. Selection of the most used classification models based on ML and DL in the state of the art obtained.
  4. Application of the selected ML and DL models to original datasets, to obtain the classification performance that will be used as a baseline.
  5. Application of text data augmentation techniques to the original datasets, to increase the amount of labeled data in each of them.
  6. Application of selected ML and DL models to augmented datasets to obtain classification performance after augmentation.
  7. Evaluation of the impact on the performance of the selected augmentation techniques on the chosen datasets using the most used metrics in the state of the art.

Experiments

This section presents the details of the experiments conducted to evaluate the influence of the augmentation techniques on the datasets used. First, the classification metrics of the unaugmented corpora are calculated using different classification algorithms. These measures are called baseline and are used to compare the results obtained after classifying the augmented corpora. Fig 4 shows graphically the flow of the experimentation.

The experiments were conducted on a server featuring Debian GNU/Linux 12 (bookworm) as its operating system. It was equipped with 2 x Intel Xeon® CPU E5-2683 v4 2.10GHz processors, totaling 16 cores and 64 threads with hyperthreading enabled, along with 256 GB of RAM. Next, this section will describe six datasets made available by the SoMos group of the University of Bío-Bío that were augmented with the techniques reviewed above. In addition, the ML and DL algorithms used in the classification processes will be selected, as well as the data augmentation techniques based on the guidelines provided in the literature review.

Selection of datasets and augmentation techniques

Among the datasets available for this study is a collection generated in earlier studies by the SoMos research group: the first, called October 18 is composed of comments collected from Twitter in the context of the social outbreak that occurred in Chile in 2019. The set is labeled with 8 categories for Emotion Analysis based on Plutchick’s taxonomy. The second set of data called Aggressiveness was used in the work of Lepe et al. [61] and is composed of comments from Twitter used to detect cyberbullying in Spanish. The teaching survey dataset consists of feedback provided by students in the teaching performance evaluation conducted at the University of Bio-Bío in 2018. This set, unlike the others, is labeled to classify four distinct categories for each sample (Affect, Aggressiveness, Polarity, Seriousness). That is why it was divided into four datasets, one for each category. On the other side, the Newspaper Headlines dataset was created for the work of Martínez-Araneda et al. [62] to analyze the bias of Chilean newspaper headlines between 2014 and 2015. At last Gender-based Violence dataset was created in the work of Calbullanca et al. [63, 64] and corresponds to a set of song lyrics in Spanish of different musical genres that depict violence against women. Table 4 summarizes the characteristics of the datasets.

In addition to the above, the datasets were characterized in detail according to the criteria defined in Table 5 and applied in Table 6.

thumbnail
Table 5. Criteria for detailed characterization of the dataset.

https://doi.org/10.1371/journal.pone.0310707.t005

Regarding the phase of selecting augmentation techniques outlined in the methodology, it is observed that the most representative technique in this category is Easy Data Augmentation (EDA) [18], known for its synonym replacement, word order change, word deletion, and word addition, was considered, except for the deletion method which, according to the EDA authors, degrades classification results.

Regarding the classification models, these were selected based on the most used in the literature review conducted. It shows that BERT is the most used classification algorithm in the reviewed articles, accounting for 23%. For our experiments, we will use BETO, a variant of BERT trained in Chilean Spanish, the same language as the data sets to be used in the experimentation. Regarding other classifiers, LSTM was used in 15% of the articles reviewed, followed by BiLSTM with 12%, and CNN with 7%. In the case of SVM, it was recommended by the SOMOS group as it has shown good classification results with the selected data sets.

Calculation of the baseline.

The procedure used to obtain the classification performance of the datasets without augmentation that will be used as a basis of comparison for the experiments is described below. The most used algorithms in the literature review were applied to them: CNN, LSTM, BiLSTM, BERT (BETO), and SVM. 70% of the data was used for training and 30% for validation. The validation subsets were the same for all classifiers for both baseline and post-augmentation classification performance. The hyperparameters used by each of the classifiers can be seen in Table 7. These parameters remain consistent for both machine learning (ML) and deep learning (DL) models during classification, following augmentation. This ensures a fair comparison of model performance between ML and DL on both the original and augmented datasets.

Once the algorithms were applied to the datasets without augmentation, the results shown in Table 8 were obtained. In it, you can see the results with the accuracy and F1-score metrics for each dataset.

From the results presented, the classification model that obtains the best results in either accuracy or F1-score is BERT (BETO). However, in the gender-based violence dataset, the SVM model is the one that obtains the best results in both metrics with a difference of 3% concerning its closest follower (BERT).

The procedure with transformation techniques.

To augment the selected datasets, the training subset is augmented in two dimensions. The first is the percentage of modification over a sentence, while the second is the amount of augmentation made over the dataset. Table 9 shows the distribution, percentage, and number of augmentations applied.

The second experiment corresponds to the selection of the best percentage of modification after increasing with EDA and classifying, and then increasing the number of samples of the minority classes in the datasets using that percentage to perform dataset balancing.

The procedure with generative techniques.

Augmentation experiments with generative techniques were performed using generative adversarial networks (GANs) using the SentiGAN model proposed by Wang and Wan [65]. To augment this model, the classes of each dataset were separated into separate files and then augmented one by one. Augmentation with SentiGAN was done because although the model claims to be prepared to generate augmentation for multiple classes present in a dataset, in practice it can only generate for a single class. Table 10 shows the configuration parameters employed for augmentation using SentiGAN.

The procedure with paraphrasing techniques.

Experiments to augment the datasets using paraphrasing techniques were conducted using Google’s translation service. To begin with, the datasets in xlsx format were uploaded to the site, and then the translation was downloaded in three languages (English, German, and French). The next step was to upload the translated documents and translate them back into Spanish. Thus, slightly modified sentences were added to the training dataset for the classification models used. The datasets were incrementally augmented with the results obtained from the translation, according to Table 11.

Results

From the results of the experiments conducted on the sets of texts in Spanish described in Table 6, the following guidelines were generated for the selection of augmentation techniques in the form of a decision tree from the point of view of the classification task addressed by the dataset. It should be noted that this tree is presented in a descriptive and non-predictive manner given the volume of data and instances for each class of the result dataset.

The first decision tree (Fig 5) is the selection of the augmentation technique based on the impact it has on the classification performance for the EA classification task. On the other hand, the second decision tree (Fig 6) presents the best paths in selecting an augmentation technique according to the impact on performance for the baseline for the datasets for the SA classification task.

thumbnail
Fig 5. Guide to selecting data augmentation techniques for emotion analysis.

https://doi.org/10.1371/journal.pone.0310707.g005

thumbnail
Fig 6. Guide to selecting data augmentation techniques for sentiment analysis.

https://doi.org/10.1371/journal.pone.0310707.g006

Table 12 shows the results of augmentation with the selected techniques and makes a comparison between them and the baseline. The DA approaches included are:

  • EDA: Transformation augmentation technique.
  • EDA-B: Balancing classes using EDA.
  • SentiGAN: Generation augmentation technique.
  • BT: Augmentation technique by back-translation paraphrasing.

The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset.

Another important result of this work is related to the guidelines for the selection of techniques. These consist of a series of rules obtained from the analysis of the results of this work, which provide guidance when applying data augmentation techniques. They were classified into four levels according to the degree of improvement in the results obtained from:

The characteristics of the data set (corpus) used: size of the set, sentence size, formality of language.

  • The type of augmentation technique to be used.
  • The classification algorithm to be applied.
  • The approach to analysis, i.e., Sentiment Analysis (classification into two classes) or Emotion Analysis (classification of more than two classes).

For example, the rule presented as "Good WHEN DA IS BT OR EDA & Classification IS LSTM & Samples IS M" describes that for a medium (M) sized corpus, regardless of the formality of the text or the average length of the sentences, if the LSTM classification algorithm is used, it is suggested that data augmentation using the EDA or BT techniques be used to obtain an increase of between 3% and 5% in classification performance compared to not using data augmentation. This is in the case of Emotion Analysis (EA).

In cases where the data augmentation obtains good results, rules such as those described in Table 13 are obtained.

thumbnail
Table 13. Examples of EA selection rules (positive results).

https://doi.org/10.1371/journal.pone.0310707.t013

In cases where the data augmentation obtains poor results, rules such as those described in Table 14 are obtained.

thumbnail
Table 14. Examples of EA selection rules (negative results).

https://doi.org/10.1371/journal.pone.0310707.t014

In cases where the data augmentation obtains good results, rules such as those described in Table 15 are obtained.

thumbnail
Table 15. Examples of SA selection rules (positive results).

https://doi.org/10.1371/journal.pone.0310707.t015

In cases where the data augmentation obtains poor results, rules such as those described in Table 16 are obtained.

thumbnail
Table 16. Examples of SA selection rules (negative results).

https://doi.org/10.1371/journal.pone.0310707.t016

Discussion

When analyzing the impact of the increase on ranking performance, the following can be seen:

  • One of the determining factors in the impact that augmented datasets have on classification performance is how different the samples created by augmentation are. In all the experiments performed, diversity was a determining factor when increasing the performance of the classifiers over the number of artificial samples created with augmentation.
  • Another factor that affects the quality of the samples artificially created with augmentation techniques is the average length of the sentence. When transformation or generation techniques are used, if a dataset has sentences with an average of fewer words than 15 words, there is a minor or negative impact on classification performance.
  • The technique of augmentation by paraphrasing, and back-translation, has a positive impact on most datasets by adding a sentence like the existing one but syntactically distinct enough that it is not the same but that converses with the semantics, thus preventing classifiers from being trained on datasets with similar samples, which leads to overfitting.

Another relevant topic when analyzing the impact of augmentation techniques on ranking performance is the challenges faced when applying these techniques. Among the challenges met in applying augmentation techniques are:

  • The augmentation techniques used and reviewed are not trained in the Spanish language. Therefore, it was necessary to change them in a way that could augment the datasets described in the Experiments section. In the case of EDA, this modification was the change of the lexical dictionary used by the technique and the change of the stop words in English for their Spanish version. On the other hand, SentiGAN was trained with different datasets in English with a larger number of samples and with an average number of words per sentence higher than the datasets used in this work, therefore, it was necessary to retrain the model with the datasets in Spanish.
  • Another challenge met with SentiGAN was that the documentation of the technique that goes with the code in the GitHub repository is https://github.com/Nrgeup/SentiGAN scarce and does not conform to what is expressed in the work of Wang and Wan [65] which shows that SentiGAN is ready for the generation of sentences for multiple classes. In practice, the code provided in the repository is prepared to generate samples for only one class. This harmed the time needed for the execution of the experiments conducted in this study.
  • An additional challenge was that the generative augmentation techniques analyzed in the state of the art are written in versions of Python and dependencies that are no longer in force. For example, SentiGAN was written in Python 2.7 and uses TensorFlow 1.4. These requirements made it necessary to create virtual environments that could execute the technique and that had sufficient computing power for its execution.

Conclusions

This work contributes to the advancement of natural language processing through a framework that guides the selection of extension techniques, which artificially augment datasets in Spanish, specifically for classification tasks. To achieve this purpose, an exhaustive analysis of the current state of text augmentation was conducted through an extensive review of the literature. This analysis allowed us to understand that text enlargement involves a series of methods used to artificially augment labeled datasets, and the techniques were classified according to the manipulation performed on the data. Experiments with transformation, generation, and paraphrasing techniques show the following:

  1. Depending on the augmentation technique and the classifier used, it is possible to improve the classification performance.
  2. For datasets related to emotion analysis, the back-translation paraphrasing technique is one of the best options, regardless of the characteristics of the dataset.
  3. In data augmentation with EDA, the percentage of word modification in a sentence is a crucial parameter for adding diversity to the dataset, but EDA does not guarantee the preservation of semantics, suggesting exploring other EDA-based amplification techniques.
  4. Balancing datasets with EDA can significantly decrease the performance of classifiers due to sentence quality and poor original performance.
  5. Generative techniques show good results on sizable and balanced datasets, especially for the SA task when using the CNN classifier.

The proposed selection guidelines are considered a starting point for the choice of Spanish text augmentation techniques, subject to improvements as more experiments are conducted to extend them to a greater number of features present in the Spanish datasets.

In conclusion, the application of augmentation techniques improves the classification performance in various DL and ML models for Spanish data sets, despite their smaller size compared to English data sets. The results and code are available in the GitLab repository https://gitlab.com/rgutierrezb/dataaugmentation.

Future works

Given the results obtained in the balancing of classes with EDA, in the future, it is desirable to explore the effects of balancing by increasing the sets of texts that address the task of analyzing emotions in Spanish (EA) to improve the results obtained in this work.

The augmentation techniques used in this work correspond to the most representative of the categories of transformation, generation, and paraphrasing belonging to the taxonomy [17], so future work can be considered experimenting with the remaining techniques of this taxonomy, especially with the use of Large Language models (LLMs) such as GPT (OpenAI), Llama (Meta), and Claude (Anthropic), among others.

In a more practical sense, future work contemplates the implementation of a web application based on natural language processing in the Spanish language.

Acknowledgments

This research was conducted in alliance with the SoMos (SOftware-MOdelling-Science) research group, which has the support of the Research Directorate and the Faculty of Business Sciences of the Universidad del Bio-Bío, Chile. The authors thank the Engineering 2030 Project (ING222010004) in collaboration with the InES de Género (INGE220011) and Open Science (INCA210005) projects of Universidad Católica de la Santísima Concepción, Chile.

References

  1. 1. Tahayna BMA, Ayyasamy RK, Akbar R. Automatic Sentiment Annotation of Idiomatic Expressions for Sentiment Analysis Task. IEEE Access. 2022;10:122234–42.
  2. 2. Iosifidis V, Ntoutsi E. Sentiment analysis on big sparse data streams with limited labels. Knowl Inf Syst. 2020;62:1393–432.
  3. 3. Li Q, Peng H, Li J, Xia C, Yang R, Sun L, et al. A Survey on Text Classification: From Traditional to Deep Learning. ACM Trans Intell Syst Technol. 30 de abril de 2022;13(2):1–41.
  4. 4. Sun X, He J. A novel approach to generate a large scale of supervised data for short text sentiment analysis. Multimed Tools Appl. 2020;79:5439–59.
  5. 5. Pei Y, Chen S, Ke Z, Silamu W, Guo Q. AB-LaBSE: Uyghur Sentiment Analysis via the Pre-Training Model with BiLSTM. Applied Sciences 2022. 2022;12:1182.
  6. 6. Abdul Qudar MM, Bhatia P, Mago V. ONSET: Opinion and Aspect Extraction System from Unlabelled Data. En: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC) [Internet]. IEEE; 2021. p. 733–8. Disponible en: https://doi.org/10.1109/SMC52423.2021.9658689
  7. 7. Haralabopoulos G, Torres MT, Anagnostopoulos I, McAuley D. Text data augmentations: Permutation, antonyms and negation. Expert Syst Appl. 2021;177:114769.
  8. 8. Jacob I, Shushma G. A Semantic Approach for Computing Speech Emotion Text Classification Using Machine Learning Algorithms. En: 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT) [Internet]. IEEE; 2022. p. 1–5. Disponible en: https://doi.org/10.1109/ICEEICT53079.2022.9768465
  9. 9. Chen J, Luo L, Ji B, Zhao S, Zhang Y. A Joint Learning Sentiment Analysis Method Incorporating Emoji-Augmentation. En: 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS) [Internet]. IEEE; 2022. p. 348–54. Disponible en: https://doi.org/10.1109/CCIS57298.2022.10016405
  10. 10. Li G, Wang H, Ding Y, Zhou K, Yan X. Data augmentation for aspect-based sentiment analysis. International Journal of Machine Learning and Cybernetics. 2023;14:125–33.
  11. 11. Balakrishnan V, Shi Z, Law CL, Lim R, Teh LL, Fan Y. A deep learning approach in predicting products’ sentiment ratings: a comparative analysis. Journal of Supercomputing. 2022;78:7206–26. pmid:34754140
  12. 12. Dhiman A, Toshniwal D. AI-based Twitter framework for assessing the involvement of government schemes in electoral campaigns. Expert Syst Appl. 2022;203:117338.
  13. 13. Duwairi R, Abushaqra F. Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis. PeerJ Comput Sci. 2021;7:1–25. pmid:33954245
  14. 14. Santoso N, Mendonça I, Aritsugi M. Text Augmentation Based on Integrated Gradients Attribute Score for Aspect-based Sentiment Analysis. En: 2023 IEEE International Conference on Big Data and Smart Computing (BigComp) [Internet]. IEEE; 2023. p. 227–34. Disponible en: https://doi.org/10.1109/BigComp57234.2023.00044
  15. 15. Wang Q. Learning From Other Labels: Leveraging Enhanced Mixup and Transfer Learning for Twitter Sentiment Analysis. En: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) [Internet]. IEEE; 2021. p. 336–43. Disponible en: https://doi.org/10.1109/ICTAI52525.2021.00055
  16. 16. Bayer M, Kaufhold MA, Reuter C. A Survey on Data Augmentation for Text Classification. ACM Comput Surv [Internet]. 2022;55. Disponible en: https://doi.org/10.1145/3544558
  17. 17. Queiroz H, Paraiso EC, Barbon S. Toward Text Data Augmentation for Sentiment Analysis. IEEE Transactions on Artificial Intelligence. 2022;3:657–68.
  18. 18. Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). 2020;437–42.
  19. 19. Yuan H, Song Y, Hu J, Ma Y. Design of Festival Sentiment Classifier Based on Social Network. Comput Intell Neurosci [Internet]. 2020;2020. Disponible en: pmid:32831820
  20. 20. Lee J, Kim J. Improving Generation of Sentiment Commonsense by Bias Mitigation. En: 2023 IEEE International Conference on Big Data and Smart Computing (BigComp) [Internet]. IEEE; 2023. p. 308–11. Disponible en: https://doi.org/10.1109/BigComp57234.2023.00061
  21. 21. Shehu HA, Sharif MH, Sharif MHU, Datta R, Tokat S, Uyaver S, et al. Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data. IEEE Access. 2021;9:56836–54.
  22. 22. Liu S, Lee K, Lee I. Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentation. Knowl Based Syst. 2020;197:105918.
  23. 23. Bogoradnikova D, Makhnytkina O, Matveev A, Zakharova A, Akulov A. Multilingual Sentiment Analysis and Toxicity Detection for Text Messages in Russian. En: 2021 29th Conference of Open Innovations Association (FRUCT) [Internet]. IEEE; 2021. p. 55–64. Disponible en: https://doi.org/10.23919/FRUCT52173.2021.9435584
  24. 24. Almasre MA. Enhance the Aspect Category Detection in Arabic Language using AraBERT and Text Augmentation. En: 2022 Fifth National Conference of Saudi Computers Colleges (NCCC) [Internet]. IEEE; 2022. p. 1–4. Disponible en: https://doi.org/10.1109/NCCC57165.2022.10067648
  25. 25. Ha S, Grubert E. Hybridizing qualitative coding with natural language processing and deep learning to assess public comments: A case study of the clean power plan. Energy Res Soc Sci. 2023;98:2214–6296.
  26. 26. Wang X, Sheng Y, Deng H, Zhao Z. Information and Control ICIC International ©2019 ISSN. International Journal of Innovative Computing. 2019;15:227–46.
  27. 27. Tahayna B, Ayyasamy RK, Akbar R, Subri NFB, Sangodiah A. Lexicon-based Non-Compositional Multiword Augmentation Enriching Tweet Sentiment Analysis. En: 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS) [Internet]. IEEE; 2022. p. 19–24. Disponible en: https://doi.org/10.1109/AiDAS56890.2022.9918749
  28. 28. Al-Jamal WQ, Mustafa AM, Ali MZ. Sarcasm Detection in Arabic Short Text using Deep Learning. En: 2022 13th International Conference on Information and Communication Systems (ICICS) [Internet]. IEEE; 2022. p. 362–6. Disponible en: https://doi.org/10.1109/ICICS55353.2022.9811153
  29. 29. Abu Farha I, Oprea SV, Wilson S, Magdy W. SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic. En: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) [Internet]. Seattle, United States: Association for Computational Linguistics; 2022 [citado 2 de abril de 2024]. p. 802–14. Disponible en: https://aclanthology.org/2022.semeval-1.111
  30. 30. Kraus M, Feuerriegel S. Sentiment analysis based on rhetorical structure theory:Learning deep neural networks from discourse trees. Expert Syst Appl. 2019;118:65–79.
  31. 31. Kelsingazin Y, Akhmetov I, Pak A. Sentiment Analysis of Kaspi Product Reviews. En: 2021 16th International Conference on Electronics Computer and Computation (ICECCO) [Internet]. IEEE; 2021. p. 1–5. Disponible en: https://doi.org/10.1109/ICECCO53203.2021.9663854
  32. 32. Feng Z, Zhou H, Zhu Z, Mao K. Tailored text augmentation for sentiment analysis. Expert Syst Appl. 2022;205:117605.
  33. 33. Wei S, Yu D, Lv C. Text Editing for Augmented Distilled BERT. En: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) [Internet]. IEEE; 2020. p. 437–42. Disponible en: https://doi.org/10.1109/ICTAI50040.2020.00075
  34. 34. Santibáñez-Cortés E, Carrillo-Cabrera A, Castillo-Castillo YA, Moctezuma-Ochoa DA, Muñiz-Sánchez VH. BERT and Data Augmentation for Sentiment Analysis in TripAdvisor Reviews. En: IberLEF@ SEPLN. 2022.
  35. 35. Pan R, Alcaraz-Mármol G, García-Sánchez F. UMUTeam at HOPE2023@ IberLEF: Evaluation of Transformer Model with Data Augmentation for Multilingual Hope Speech Detection. En: IberLEF@ SEPLN. 2023.
  36. 36. Cortés ES. BO−TextAutoAugment: Aumento de Datos automático en NLP usando Optimización Bayesiana. [Master’s thesis]. Mathematics Research Center, CIMAT.; 2022.
  37. 37. Navas-Loro M, Rodríguez-Doncel V. Spanish corpora for sentiment analysis: a survey. LRE. 2020;54(2):303–40.
  38. 38. Luo J, Bouazizi M, Ohtsuki T. Data Augmentation for Sentiment Analysis Using Sentence Compression-Based SeqGAN with Data Screening. IEEE Access. 2021;9:99922–31.
  39. 39. Liu X, Zhong Y, Wang J, Li P. Data augmentation using Heuristic Masked Language Modeling. International Journal of Machine Learning and Cybernetics. 2023;1–15.
  40. 40. Tang T, Tang X, Yuan T. Fine-Tuning BERT for Multi-Label Sentiment Analysis in Unbalanced Code-Switching Text. IEEE Access. 2020;8:193248–56.
  41. 41. Krishnan J, Anastasopoulos A, Purohit H, Rangwala H. Cross-Lingual Text Classification of Transliterated Hindi and Malayalam. En: 2022 IEEE International Conference on Big Data (Big Data) [Internet]. IEEE; 2022. p. 1850–7. Disponible en: https://doi.org/10.1109/BigData55660.2022.10021079
  42. 42. Md Rafi-Ur-Rashid, Mahbub M, Adnan MA. Breaking the Curse of Class Imbalance: Bangla Text Classification. ACM Transactions on Asian and Low-Resource Language Information Processing. 2022;21:1–21.
  43. 43. Shang Y, Su X, Xiao Z, Chen Z. Campus Sentiment Analysis with GAN-based Data Augmentation. En: 2021 13th International Conference on Advanced Infocomm Technology (ICAIT) [Internet]. IEEE; 2021. p. 209–14. Disponible en: https://doi.org/10.1109/ICAIT52638.2021.9702068
  44. 44. Gupta R. Data augmentation for low resource sentiment analysis using generative adversarial networks. En: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2019. p. 7380–4.
  45. 45. Carrasco XA, Elnagar A, Lataifeh M. A Generative Adversarial Network for Data Augmentation: The Case of Arabic Regional Dialects. Procedia CIRP. 2021;189:92–9.
  46. 46. Wang L, Xu X, Liu C, Chen Z. M-DA: A Multifeature Text Data-Augmentation Model for Improving Accuracy of Chinese Sentiment Analysis. Sci Program [Internet]. 2022;2022. Disponible en: https://doi.org/10.1155/2022/3264378
  47. 47. Tan KL, Lee CP, Lim KM. RoBERTa-GRU: A Hybrid Deep Learning Model for Enhanced Sentiment Analysis. Applied Sciences. 2023;13:3915.
  48. 48. Sun T, Jing L, Wei Y, Song X, Cheng Z, Nie L. Dual Consistency-enhanced Semi-supervised Sentiment Analysis towards COVID-19 Tweets. IEEE Trans Knowl Data Eng. 2023;1–13.
  49. 49. Kodiyala VS, Mercer RE. Emotion Recognition and Sentiment Classification using BERT with Data Augmentation and Emotion Lexicon Enrichment. En: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) [Internet]. IEEE; 2021. p. 191–8. Disponible en: https://doi.org/10.1109/ICMLA52953.2021.00037
  50. 50. Hu L, Li C, Wang W, Pang B, Shang Y. Performance Evaluation of Text Augmentation Methods with BERT on Small-sized, Imbalanced Datasets. En: 2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI) [Internet]. IEEE; 2022. p. 125–33. Disponible en: https://doi.org/10.1109/CogMI56440.2022.00027
  51. 51. Tan KL, Lee CP, Anbananthen KSM, Lim KM. RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network. IEEE Access. 2022;10:21517–25.
  52. 52. Omran TM, Sharef BT, Grosan C, Li Y. Transfer learning and sentiment analysis of Bahraini dialects sequential text data using multilingual deep learning approach. Data Knowl Eng. 2023;143:102106.
  53. 53. Body T, Tao X, Li Y, Li L, Zhong N. Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models. Expert Syst Appl. 2021;178:115033.
  54. 54. Xu N, Mao W, Wei P, Zeng D. MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks. IEEE Intell Syst. 2021;36:3–12.
  55. 55. Pandey S, Akhtar MdS, Chakraborty T. Syntactically Coherent Text Augmentation for Sequence Classification. IEEE Trans Comput Soc Syst. 2021;8:1323–32.
  56. 56. Jiang Q, Chen L, Zhao W, Yang M. Toward Aspect-Level Sentiment Modification Without Parallel Data. IEEE Intell Syst. 2021;36:75–81.
  57. 57. Srinivasarao U, Sharaff A. Machine intelligence-based hybrid classifier for spam detection and sentiment analysis of SMS messages. Multimed Tools Appl. 2023;1–31.
  58. 58. Shyang YK, Yan JLS. A Text Augmentation Approach using Similarity Measures based on Neural Sentence Embeddings for Emotion Classification on Microblogs. En: 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) [Internet]. IEEE; 2020. p. 1–6. Disponible en: https://doi.org/10.1109/IICAIET49801.2020.9257826
  59. 59. Duong HT, Nguyen-Thi TA, Hoang VT. Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks. 2022; Disponible en: https://doi.org/10.1155/2022/3188449
  60. 60. Kumar S, Khan MB, Hasanat MHA, Saudagar AKJ, AlTameem A, AlKhathami M. Sigmoidal Particle Swarm Optimization for Twitter Sentiment Analysis. Computers, Materials & Continua. 2022;74:897–914.
  61. 61. Lepe-Faúndez M, Segura-Navarrete A, Vidal-Castro C, Martínez-Araneda C, Rubio-Manzano C. Detecting Aggressiveness in Tweets: A Hybrid Model for Detecting Cyberbullying in the Spanish Language. Applied Sciences. 12 de noviembre de 2021;11(22):10706.
  62. 62. Martinez-Araneda C, Segura A, Vidal-Castro C, Elgueta J. Is news really pessimistic? Sentiment Analysis of Chilean online newspaper headlines. Indian J Sci Technol. 2018;11:1–8.
  63. 63. Calbullanca R. Detección automática de violencia de género en letra de canciones en Español [Undergraduate thesis]. [Concepción, Chile]: Universidad del BioBío; 2023.
  64. 64. Calbullanca Viluñir R R, Segura Navarrete A A, Vidal-Castro C, Martínez-Araneda C. Corpus de letras de canciones en español etiquetadas con violencia de género. [Internet]. Zenodo: GitHub; Disponible en: https://github.com/somos-ubb/Lyrics_Gender_Violence
  65. 65. Wang K, Wan X. SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks. 2018