Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An investigation into the deep learning approach in sentimental analysis using graph-based theories

  • Mohamed Kentour ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation School of Computing and Engineering, University of Huddersfield, Huddersfield, West- Yorkshire, United Kingdom

  • Joan Lu

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Supervision

    Affiliation School of Computing and Engineering, University of Huddersfield, Huddersfield, West- Yorkshire, United Kingdom

An investigation into the deep learning approach in sentimental analysis using graph-based theories

  • Mohamed Kentour, 
  • Joan Lu


Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a “black-box” and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features’ extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets’ source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users’ preferences (i.e., frequency degree) and via the activation’s derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc.


Due to the tremendous covering and standardization of social media and Internet of Things on our daily life [1, 2] people feel more confident to consider this digital connected world as a new communication tool. Research in Machine Learning (ML) has widely addressed different ways to assess people’s thoughts and retrieve meaningful correlations to best quantify them, this is known as Sentiment Analysis (SA). The latter has revolutionized several domains by considering users’ understanding and feedback about specific topics to improve their trustworthiness and therefore benefits businesses [3], this includes:

  • Business: assessing customers’ voices [4], market research and analytics [5] (e.g., e-business), reputation management [6], etc.
  • Technology: Recommendation systems [7], robots’ adaptation [8], assessing astronauts’ mental health [9], etc.
  • Social actions: Real world events’ monitoring, smart transport/cities [10], social media monitoring (i.e., racism detection [11, 12]), etc.
  • Politic: peaceful government solutions [13], clarifying politicians’ positions, opinions inversion prediction [14], etc.
  • Healthcare: approaching people from different background/races by extracting common feedbacks and correlations [15], retrieving insights in order to improve treatments (e.g., breast cancer treatment experience [16], brain data [17] has been extracted to infer correlations among naïve speakers, etc).

Most of these works perceived SA as a classification task (e.g., Support Vector Machine (SVM) [18], Naïve Bayes (NB) [19], bias impact on ML [20], etc.). In this sense, recent works have shown promising outcomes by boosting the performance of these algorithms. In [21], a feature selection mechanism has been proposed and outperforms some classical selection techniques (e.g., Term-frequency, Chi-square, etc.) by providing more context to the feature’s size reduction rather than frequency (i.e., data spread, output correlation, etc.).

Despite some promising classifiers (e.g., NB with 94.02% accuracy [22], SVM and NB with 90% and 95% respectively [23], etc.) in the domains like healthcare for instance, it is known that data (e.g., Functional rehabilitation) are highly correlated [24] and not equally distributed [25]. Those latter exclusions require more better analytic frameworks that merges both computational power and a covering knowledge in order to adjust the SA to the medical field. In this sense, graph generation techniques are known for their expressiveness and deep data processing [26] which gave a way to a recent analysis technology known as graph embedding [27]. The latter technique has been subject to many ML improvements (e.g., reducing input size and feature selection for an accurate text classification [22, 23], etc.).

Latest efforts on Deep Learning (DL) have been showing good function approximations rather than traditional ML ones [28] by using additional components (i.e., thresholds, weights, activation functions, etc.); however, SA for healthcare implies a deep investigation at several levels, that was justified in [29] by using an accompanied text investigation along with the Convolutional Neural Network (CNN) algorithm, which means DL still lacks an extensible feature learning mechanism to best answer the SA process as advocated. In this work, we investigate a new deep neural network method for SA which better approximates the different aspects of SA (i.e., polarity, subjectivity, frequency of terms/tweets within text, etc.), this contribution is two-fold: 1) improving the feedforward path by proposing an embedding strategy for the input unit which reduces the data training complexity within a low-dimensional space. 2) increasing the backward path’s precision by scoring the features following their importance (i.e., high frequency, better activation function approximation, etc.), which guarantees a rapid learning surge with a good performance (i.e., high accuracy, F-score, etc.). Furthermore, the model has been shown to be transparent and efficient.

The remainder of this paper is organized as follows: Section 2 lists the research questions and a set of respective hypotheses which emphasize the developed aspects of this research. Our aims and objectives are detailed in section 3. Section 4 presents the literature review and the theoretical aspect of this research. Whereas, our proposed methods are presented in section 5, this is followed by an experimental study in section 6. We evaluate our model in section 7, and then we critically discussed the whole work in section 8. Section 9 concludes the paper and gives few perspectives.


The mechanism of the actual Deep Neural Network (DNN) has been officially proposed by [30] as a supervised Multi-Layer Perceptron (MLP). To our best knowledge, the same authors were the first introducers of DNNs transparency by training each layer independently and learning their correlated representations. This was a feed-forward model of multiple layers (called connected components) of non-linear activation functions. However, the theory of the input’s influence on the output performance within neural networks was discussed few years before by [31] known as the problem of Credit Assignment Paths (CAPs). The latter consists of deciding which DNN components are influencing the model performance. While this problem could be addressed in a different manner, similar works agreed on the final performance as the main criteria to justify the model’s efficiency. In [32], authors have been investigating the stability of DNN (i.e., multidirectional LSTM) components modelled as a grid as a way to stop DL model vanishing problem. Although authors in [32] have achieved state-of-the-art performance, the complexity of the input space and the state activation layer in [32] remains an issue if deployed with limited resources.

Nowadays, with the emergence of Neuroscience and artificial neural networks [33], CAPs are not only limited to a certain layer. Moreover, back-propagation strategy [34] remains inefficient in certain vanishing or overfitting problems, which are more likely to occur due to the equal consideration of the input samples (see [21]).

As SA became popular for many DL applications, the lack of transparency in decision making within specialized domains like medicine [35] is quite misleading and some practices may oppose to the General Data Protection (GDPR). To our best knowledge, CAPs has not yet been investigated in this research area whereas it was the origin of DL transparency as stated before. Therefore, by this research, we aim to investigate CAPs theory for a transparent DNN structure that best answers the SA. In contrast to the DL models from literature, we want to keep the complexity (i.e., special/temporal, see “Complexity analysis”) as lower as possible, and this will be done by acting on the building cycles of a DNN (i.e., feedforward and backward paths) while restricting the input features in a lower space representation and then scoring the derivative instances with a selection mechanism respectively.

Research questions and hypotheses

In order to best understand the proposed research investigation as well as the objective method, the following questions listed in Table 1 aim to frame this research into the right context. A set of hypotheses have been proposed followed each research question.

Table 1. The proposed research and the following hypotheses.

Aims and objectives

Only few attempts have tried to associate graph technologies to the deep sentiment analysis process [37, 38]. The aim of the proposed method is to study the influence of the input nodes and hidden layers on the final DNNs outputs, in such way, getting the right sample features will help to reduce the features vector space while keeping the model rationality. This was inspired from the attention mechanism [39] along with deploying the deep neural architecture. The study will focus on people’s tweets, the goal is to enrich the DNN structure with graph embedding learning [27], which will be refined through a selective strategy. The following Fig 1 associates each proposed research question with the envisaged aims and objectives respectively.

As shown in Fig 1, we aim for each research question to be answered following the associated objectives, and that for the following purpose:

  1. Answering that question will help to emphasize the increasing trend toward explainable DL and the different approaches (see “Transparency in DL”).
  2. Expending this question allows to figure out a convenient way to abstract a given DL problem while being rational to the internal structure (see “Abstraction strategy”).
  3. By exploring this question, most recent GNNs have been reviewed and the main obstacle for making them understandable was highlighted (see “Graph based neural networks”).
  4. This question will help to reveal a partitioning method that permits to identify the DNNs unit concerned by the proposed method (see “Methods”) and that has impact on the whole performance.
  5. This question will motivate the most recent attentional mechanism within SA and the way to merge that with graph embeddings methods (see “DL applications on SA”).

Literature review

In this section, we review most recent applications of DL on SA and their performance. Then, we address explanability within DL by emphasizing recent graph-based learning models.

Research strategy

The following strategy denotes the main resources and the data extraction scheme which allows a good reflection of the multidimensionality topic of DNNs with respect to the SA field. This is followed by an evolution chronology and a careful combination of the topics’ components (CAPs, graphs, SA, DL) which together motivate the proposed method.

Literature resources.

IEE Xplore, ScienceDirect and Springer research databases were invoked in order to retrieve papers from journals which refer to explainable DL, journal papers referring to SA have been reviewed from PubMed database, this has been refined to include works based on DL in particular. The context and key words related to each database as well as the selection results are illustrated in Figs 2 and 3 respectively, whereas the following diagram summarizes the selection strategy (Fig 4).

Fig 3. Released papers for each database corresponding to each related subject.

Subject evolution.

CAPs and explainable DL. CAPs is a historical problem [40] which explores causal paths starting from adjusting input’s weights to an optimal output. The majority of works on graph explainable DL have addressed CAPs problem from specific angles, usually referred to as “model specific” [41]; however, only few attempts have tried to position a DNN as a compositional unit [42] and the best way to assign input values which refers to the historic CAPs. As shown in Fig 5, CAPs is gaining more and more attention during last years, as well as published papers with a reference to explainable DL (XDL) and CAPs. Most of them were bio-inspired which treat credits as electric signals coming from external sensors, known as “cause-affect” strategy.

Fig 5. Published papers referring to CAPs and explainable DL with reference to CAPs.

Graphs and CAPs. As stated before, research on CAPs has begun as a way to assign credits to better minimize the error function [42]. Fig 6 illustrates new categorization of CAPs’ approaches based on neuron paths’ detection.

The main question which was preventing CAPs from being widely explored as an efficient performance parameter was “whether the brain backpropagates or not”; in this sense, graphs have been subject of research in order to represent the relevance between data patterns [43], RNNs have been firstly proposed to deal with backpropagation, then LSTMs [44, 45] and Sliced RNNs (SRNNs) [46] for a constant vanishing prevention and long term dependencies respectively.

As shown by Fig 7, new models became popular, they’re all characterized by their graphic nature which not only try to solve a learning problem, but to learn how the resolution is inferred [47]. Stochastic learning Graphs (SGs) [48] for instance introduces new gradient setting to best reduce the loss.

Fig 7. DNNs models distribution over years as a graph based solution to CAPs.

Moreover, Generative Adversarial Networks (GANs) have been proving their efficiency in transferable learning by revealing generic analysis patterns [49]. However, large “discrete” graphs (e.g., Multi-hidden DNN) due to discrete independent weights. Furthermore, Attention layers have extended DNN structure [39] (AGs) with an importance degree of nodes or links which alleviate the discrete learning to be inductive with less computation (i.e., without matrix-factorization).

Reinforcement Learning (RL) was the most targeted model while dealing with CAPs, because the way neurons’ weights are updated (by assigning a final weight to a certain neuron) is very similar to the concept of failure/reward within RL followed by seeking an explanation for the result.

Sentiment analysis

SA has becoming a basic-block unit for many modern platforms; its evolution has seen various changes and appellations [50] along with the technology and analytics used for the analysis. Fig 8 represents a progress bar of SA according to neural networks evolution. DL has revolutionized the way SA is conducted, starting from a single perceptron that only supports a limit number of weights and bias, to a relatively better approximation of functions with Multi-Layer Perceptron (MLP) and the introduction of back-propagation algorithm. By mid 90s, SA became very popular by the introduction of kernel functions and Human-interface machines known as “Brain Computer Interface”.

Fig 8. Brief chronology of SA following the development of DL.

As certain admit that emotion detection is the future trend of SA [51], the latter is still dominating the field of medicine and psychology where DL is playing a key role on transforming people’ sentiments into computational aspects.

Sentiment analysis through CAPs.

As modern SA process may imply dealing with long text frames and guarantee inner or outer document dependency, this will initially refer to assigning certain documents to pre-training stage; therefore, it can be subject of CAPs in order to figure out the right parameters. For our knowledge, the latter problem has not been addressed from a CAPs viewpoint yet; However, as shown by Fig 9, it was remarkably shown a similar interest on both graph embedding and attention mechanisms which reflect the effectiveness of graphs in those research areas in terms of selectively highlighting the active set of neurons which can be optimized and the ones which may impact the predicted sentiment in both CAPs and SA respectively.

Fig 9. Similar research addressing “SA” and “CAPs” relative to graph technologies between 2000–2021 (based on the previous analysis (Fig 7), graphs have been getting more attention by year 2000).

DL applications on SA.

SA [52] has proven its ability to retrieve human’s feelings from several confusing texts. However, long term dependency is one of the DNNs’ application limits on SA, which consists of preserving a traceable execution of the model [53]. As a possible answer to the first part of “Research questions” (RQ5), recent models from the literature (Table 2) tried to address that issue by hybridizing some models, like LSTM with GCN [38] for instance; however, a mechanism that detects important patterns is much more needed with source variant datasets, not only for improving accuracy, but for the learning visibility.

Transparency in DL.

There has been a lot of research about clarifying DNNs and whether understanding the internal connection of neurons could improve the model performance [69]. Imaging is one of the emerging fields in DL, the majority of works tried to explain imaging systems from specific problems [70, 71]. However, language processing accompanied with the availability of large text dataset became centre of interest to many researchers, one remarkable work was done by [72] for huge text corpus explanation; although the imaging system is more clarified and flexible, the way the graph was generated doesn’t benefit from graph-based technologies that optimize the input starting from naive generation.

Overall, explanability in DL can be categorized into:

  1. Example-based approaches; research in this area is always conducted through a training-example, by specifying some initial observations which will be verified through features’ extraction, this discipline is widely adapted despite the difficulty of verifying the trustworthiness of each example, this covers:
    1. ✓ Gradient methods (e.g., Guided-back propagation, Layer-wise relevance propagation [72]), which aim to a better gradient optimization.
    2. ✓ Saliency-feature map [73] for measuring pattern importance within images and videos.
  2. Model-based approaches, which concentrate on the raw data, they’re usually referred to as input optimizers. Some recent works include the pre-processing stage of DARPA [74] where the explainable interface is built on users’ psychological aspect. [75] have explored the fusional aspect of DNNs which aims to “mimic” a function aggregator using fuzzy network, etc.

Graph based neural networks.

Graphs are playing a crucial role in processing data and preserving their semantics [76]. The idea of combining graph technologies and DL is not recent [77]. As a proof of that, many graph manipulations have been introduced: graph-pooling [78], graph-attention networks [39], etc.

However, few attempts have coupled labelled graph generation with a deep learning model apart from the activation function, which makes them extremely hard to explain or to interpret. Fig 10 compares few recent works on graph explainable DL.

Fig 10. Overall comparison of predictive accuracy.

EVCT [72]: Explainable and Visualizing CNN for text information. XGNN [73]: Explainable Graph Neural Network. STC-PG [75]: Spatial Temporal and Causal -Parse Graph. KGCN [76]: Graph-based Convolutional Network for chemical structure. HAGERec [77]: Hierarchical Attention Graph Convolutional Network Incorporating Knowledge Graph for Explainable Recommendation.

The main obstacle of abstracting every single unit of a deep neural network (see “Abstraction strategy”) as a graph structure is the non-compliance with back-propagation process. The work done by [75] is a proof of that where they had to create a function aggregator that simulates the true Choquet-integral mechanism, because graphs could be encoded as adjency-matrix for the best; and that does not fit with the back-propagator as a function optimizer. As an answer to Research questions (RQ3), we investigate recent efforts (Fig 10) and within the below sub-section, in order to retrieve certain limits on GNNs and motivate a model-based approach on the input unit of the DNN.

Analysis and discussion on graph-based SA.

The conducted evaluation illustrated by (Fig 11) depicts most DL structures and their variations in terms of accuracy following each analysis level (see 11). When considering documents as a whole, LSTM-based approaches were crucial and showed good performance to capture inter/intra documents’ correlations. However, as long as we move further from sentence-based to a single aspect level, there is much interest on aspects embedding with attention networks, the latter were able to gather neighbourhood context for better sentiment classification. That could be noticed in a recent multi-modal trends’ analysis [67], where RNN and LSTM fail to capture emotions’ boundary for the whole video while Attention-based CNN showed good performance (see Table 2).

The following notes express few limits of recent works on this area:

  • GNNs (e.g., Graph attention networks, Attention graphs, Stochastic graphs) (Fig 7) are widely considered in the area of connected data, but large labelled graphs still represent an issue due to their exponential growth, therefore moving from high dimensionality to low space representation is conditioned by being discriminative to the raw data parameters.
  • Transferable learning which consists of generalizing the DL model from a specific observation to other domains still an issue to many DL models, because they are built on a specific dataset(s). However, as justified by [79, 80] a further approach could be performed by setting up an input mechanism that map the complexity of raw data to smaller frames while being expressive.
  • High dimensional feature analysis remains an issue for most dependency-based models (LSTM [80], GRU [59]); some solutions have been deployed like skip data connections [81] to reduce the input size, they may prevent some vanishing cases, but they add more complexity as additional hidden layers to the gradient. This is why majority of research is now turning to address the agnostic aspect of the explanation, in order to impose a standard limit for the input.

The previous argumentations fall into the example-based approach (see 17), where a model selection starts from an observed fact, like neighbourhood aggregation, short term dependency, etc. However, these methods neglect the impact of DL input units on the performance, thing that justifies the “accuracy” paradoxes (Fig 11) even though a sentence or an aspect may reflect a similar sentiment. Therefore, the challenge will be to provide an explainable solution to the DNN input unit (i.e., model-based approach (see “Transparency in DL”)) as an answer to the “Research questions” (RQ1), which satisfies the CAPs (Fig 9), and this is based on the current research trend (Fig 7).


As the healthcare domain is known to be critic and full of complicated scenarios that do not forgive mistakes, one accurate way to perform a deep learning technique is by preserving the model rationality [82]. Although model oriented [83] and example-based approaches [84] have shown an explainable independency level and an input dependent optimization respectively, they both position the problem of clarifying DNNs within a barrier of high interpretability but low accuracy, and vice versa. The proposed approach in this paper consists of designing a novel DNN based on a hybrid graph embeddings/attention scoring.

DNNs are known to provide high accurate outcomes, this is known as the model performance. Formally it is described as:

  • N is the number of input and hidden layers
  • d is the desired output and z is the actual output

Mathematically, the output generation (z) through the feed-forward and back-propagation cycles is expressed as a serie of partial derivatives [33]. For instance, suppose the following in-depth view of a deep neural architecture (Fig 12) which is composed of two hidden layers, two inputs (XA, XB) and two outputs (ZA, ZB).

Abstraction strategy

In order to answer research question (RQ2) (see “Research questions”) and following the structure depicted by Fig 12, we will explore the impact of the performance “P” on the internal DNN structure. By considering both weights “w1” and “w3”, this could be expressed by the chain rule (1) and (2). The purpose is to justify a structural unit of the DNN model that could be optimized with compliance to the DNN feedforward and backward paths, see (Research questions (RQ4)).

  • It is noticeable that the selected partial derivative units are equal with respect to both “w1” and “w3” and this will be the same for the units with respect to “w2” and “w4”. That refers to the repetitive unit (Fig 12), which means it has no direct impact on the global performance as opposite to the decisional unit, where:
    • the last multiplayer Y1⊗w5 gives q1 as an input toward the activation function and generates Za as both Path1 or Path3.
    • However, it is observed that Y1 is also implied to generate Zb but this time from the multiplayer Y1⊗w7 and gives q2 to the second activation function which forms Path2 or Path4.

So, as much as we move further to the input, there are more computational units which are reused.


  1. Both Inputs “Xa” and “Xb” participate for an intermediate component “Y1” which has an impact on the final model performance.
  2. Find a way to establish an importance degree between model inputs (e.g., “Xa” and “Xb”) to figure out the one(s) with higher impact on the final output.

Input space embedding

Embeddings on graphs are known to be very useful in dealing with huge graph data and random distribution [85]. Suppose G(N, E) a graph of N nodes and E edges, where: E ∈ [1… m] and N ∈ [1… n].

The mapping function is based on a threshold which analyses the neighbourhood connections of each node, suppose (n = 500) is a maximum allowed connection:

In case of node embeddings, for a node n1 with c1 connections:

Map = {N}, f1∈N and c1 < = 500;

or Map = {N—f1} where c1 > 500.

The proposed model depicted by Fig 13, consists of a graph-based strategy which aims to reduce the input repetitive unit into a low-level space representation, then into a small vector unit which may alleviate the computation complexity of the whole DNN model.

Features’ selection via attention scoring

Instead of moving from the embedded vector space (see [23]) through the activation functions, it has been considered to score the embedded features (v1vn) following each hidden layer (L1Lk) with a set of weights aw, w = [1.. n].


The score vector represents a trace of reaching features, the latter will be mainly envisaged by the back-propagation loss function optimizer (see algorithm below), therefore by considering the activation function ((4) is the” SoftMax” for instance), the attention weight aw(i) for a hidden layer (t) will be calculated as following: (4)

Starting from the embedded distribution of features, the “Gaussian” distance metric [86] has been considered to score similar (close) features and therefore to generate a “decorated” neural path through the “SoftMax” function for instance and repeatedly to achieve best distribution. A level of genericity is aimed to be reassured in terms of the activation function selection as well as the embedded feature vector. To summarize, the corresponding learning algorithm will be:

Algorithm: To implement the proposed DNN mode (Embedding and scoring)

1. Input:.txt files //raw dataset

2. Output: sentiment-polarity

3. Procedure SA

4. Graph_SA = Networkx_Upload (path to the csv_file)

5. Samples Initializing

6. vect = Embedding (Graph_SA) /*this call may be node/edge embedding*/

7.    FOR each feature within vect do

8.        Input[x] = feature

9.        FOR all x in DNN do

10.            Output[x] = module.forwardPropagation(Input[x])

11.            IF Output[x] > = threshold /*threshold could be maximum node connectivity(e.g., most frequent aspects*/

12.                Scored[x] = Output[x] //the selected feature

13.          End

14.         Input[x+1] = Output[x]

15.        End

    /*Activation function condition (e.g., Positive sentiment polarity and attention weights calculation (2) */

16.    Sentiment-polarity = condition(Scored)

17.    IF still training then

18.        FOR each [k-x] Scored feature in DNN do //k is the total features’ number

19.            Scored = module.BackwardPropagation /*Backpropagation will stop if feature is not scored*/

20.            Input[x+1] = Scored[x]

21.        End

22.    End


The algorithm above can be explained in three main parts:

  • The graph generation and the embedded vector extraction (see “Input Space Embedding”), this covers line 1 to the 10th of the algorithm. The forward activation function is applied for each embedded feature.
  • The conditional step which is variant according to a specific domain (e.g., most frequent feature in our case), this corresponds to the line 11.
  • The features’ scoring, which a conditional step as well. However, it differs from the previous one as each feature is conditioned with the activation functions’ requirements (i.e., approximation, limit values, polarity, etc.).

Solution for high dimensional space

Our proposed mode (check the number of models with names of each mode) focuses on the input unit of the DNN, where it has been shown through the chain rule (1) and (2) that any input stream (Fig 12) follows a specific decisional path with respect to the features’ weights. Our case study (see “Experiments”) imposes a 2-d dimensional representation which corresponds to the “station-polarity” prediction. This has been achieved through a graph generation with a neighbourhood embeddings. Therefore, most influential nodes within a given station are the ones having minimal Gaussian distance (i.e., polarity of the most frequent term within the text.).

However, certain DL tasks like time series [87], adversarial examples [88] require an extension of the classical closeness methods (i.e., Gaussian distance), as the data may be distributed within k-dimensional space. Following the graph embeddings strategy denoted previously, a solution to the multidimensional space must satisfy a number of criteria:

  • The resulting embedded structure must show a reduced feature sample than the original input one.
  • The embedding function must comply with the activation function in order to cope with the path decoration.
  • A similar process (i.e., embeddings and scoring) needs to be ensured within the k-dimensional space in order to preserve the output semantic.

The projection of the above criteria results on the mapping probability [89] of a feature’s instance xi in a layeri with its respective pattern xj on a layerj. A higher probability Pi|j means a closer instance i from j (i.e., station-polarity in our case): (5)

Therefore, by considering all the k-dimensional space, the scoring function (3) as well as the activation function (4), the output attention weight aw(i) for a layer (t) will be given by: (6)

There is a clear match between the resulting scoring function (6) and the activation function (i.e., SoftMax for instance), and that confirms the second part of “Research questions” (RQ5) on the compliance of the feedforward path with the backward one, which enables an efficient performance (see “Improving DNN performance via a deterministic backward walk”).


In this section, a number of empirical experiments have been applied on tweets HN-datasets (see 27), data has been collected and unified from 16 different health news sources (stations), the proposed SA model goes beyond polarity detection of people’s feedback to the most influential aspects and sentences which contribute to polarity and subjectivity variations.

After data has been cleaned and pre-processed, we aim to build a predictive analysis around most influential tokens among tweets, after that we show the role of edge embedding in terms of transparency and the benefit of visualizing the polarity distribution on a reduced plan.


Health news tweets datasets (HN-datasets) [90] consists of 16 different sources of people’s tweets having experienced or have been exposed to healthcare situation. Data sources are represented through different text files (i.e., goodhealth.txt, foxnewshealth.txt, cnnhealth.txt, etc.), which contain more than 58000 instances and 25000 attributes. The following Table 3 lists some features of “Kaiser Health news”, “Fox news” and “Good Health” stations for instance.

Table 3. Characteristics of three health tweets datasets.

These datasets are used to prove the model working strategy. It has been decided to use these datasets to deal with heterogeneous data (i.e., different encoding, insignificant words, healthcare domain specifications) and perform a global SA of tweets.

Development environment

This work has been done on a UNIX system (Ubuntu Kylin ver. 20.10, architecture x86_64, processor intel core i5). Python 3.8 was the main programming language adopted for implementing the data procedures and the following data analysis tasks (see next sub-sections in the current section “Experiments”). Jupyter was the main development API with some of the following python libraries for basic functions and visualizations:

  • The “glob” module as a Unix pathname style for datasets uploading.
  • “nltk” as a natural language toolkit for stop words remover for instance.
  • “re” module to deal with the unstructured tweets’ files as regular expressions.
  • “math” library to invoke mathematical functions (e.g., “Tanh”, “exp” functions to implement the DNN activations, “log” function for loss simulation, etc.).
  • “WorldCloud” library for frequent tokens display.
  • “Networkx” for graph generation, etc.

Data cleaning and pre-processing

The challenging aspect about retrieving tweets from different sources is the heterogeneous nature of data that consists of different encoding styles (utf-8, cp1252, etc., see Table 3), because an overall SA around specific data sources is aimed to be achieved.

Text split.

As tweets are totally informal, a list of special characters [。?、~@#¥%……&*();:\s+\.\!\/_,$%^*(+\"\’]+|[+―!] has been considered to split lines into raw sequences of tweets containing only natural language terms.

Stop word remover.

Tweets within the above dataset come with unstructured textual format, therefore a proper tweets analysis consists of splitting sentences/aspects and removing all sort of non-significance in order to retrieve the most meaningful sentiment. NLTK’s stop list English words has been used with more domain specific non-relevant words (i.e., new, may, com, etc.).

Statistical sentiment analysis

Instead of measuring independent word combinations [91], the proposed approach aims to achieve a global sentiment polarity of the whole data corpus which merges sources’ heterogeneity, global term relevant frequency and an additional sentiment feature called “subjectivity”. A word-cloud distribution of most frequent words related to healthcare within “everydayhealth”, “gdnhealthcare”, “usnewshealth” is depicted by Figs 1416 respectively.

Polarity vs subjectivity.

In healthcare domain, it is commonly used to detach the sentiment polarity from the sentiment subjectivity [52, 91, 92]. However, as illustrated by Fig 17, it has been found a high correlation between high frequent tokens and their correspondent polarity/subjectivity. The Polar {P} and subjective {S} values are interpreted as follows:

P = {> 0 → Positive sentiment

                0 → Neutral sentiment

                < 0 Negative sentiment}

S = {0 → Objective sentiment

        > 0 → Subjective sentiment}

Figs 17 and 18 show the overall polarity distribution as well as polar/subjective variations respectively of health news tweets based on relevant terms frequency distribution.

Among the 16-health news, only 34.3% of frequent tweets expressed negative healthcare sentiments (P < 0), while 70.4% of them were objective (S < 0.5), this is due to the informal nature of tweets. Furthermore, an interesting observation concerns most frequent terms (Figs 19 and 20) where there was a parallel symmetric decrease of sentiments towards negative and objective feedbacks, which imbalances the overall positivity of tweets as well as their subjectivity.

Fig 20. 3-d plot frequency, polarity and subjectivity distribution.

Predictive analysis

By the proposed model, it is aimed to go beyond the subjectivity or polarity detection, to achieve a transparent predictive analysis of tweets. The goal is to take the above observations over tweets level, but to the data source level. The technique consists of a graph generation which is centred around the 16 health news stations, so given a source of tweets, it would be possible to predict the sentiment polarity/subjectivity instead of going through each tweet, then together these stations are connected within a map (Figs 21 and 22). This application could be seen as community sentiment polar prediction. The following definitions have been proposed to better approach the “Research questions” (RQ3 and RQ5).

Fig 21. Station-polarity graph generation without edge embedding.

Fig 22. Station-polarity graph generation after edge embedding.

Definition. 1 Given a graph G = (V, E), where a set of tweets’ stations V = {v1, …,v16} and a predictable set of edges E = {e1, …, eN} and N is total number of tweets. A positive sentiment polarity prediction (p) for each station is a link prediction/inference problem where a connection ei = vip exists iff:

Lemma. Performing edge embeddings on the source data prevents the worst-case iteration (i.e., negative or positive sentiments) and maps the station polarity from DNN prediction to a link prediction problem.

Example. The following Figs 23 and 24 represent the sentiment polarity of different stations’ tweets before and after applying edge embeddings respectively.

Fig 23. Two dimensions (station-polarity) graph embedding.

Fig 24. Attention scores for stations’ polarity predictions.

In addition to the visibility gained by embedding the graph edges, node embeddings (Fig 23) allow a reduced representation of the observed polar sentiments with a clear polar symmetry within the news stations. In our case, the generated graph consists of a set of nodes which are only identified by their labels without any other features. As this is not supported by the recent embedding algorithms (e.g., GraphSage [85]), an abstract version of node2vec algorithm has been implemented which instead of randomly iterates over all connections, it aggregates the neighbourhood nodes of a given station following the predefined constraint (see Definition.1).

Definition 2. A scored connection between a station and a sentiment polarity is a neighbourhood aggregation of the scores of its neighbours such as:

(or any other threshold condition) needs to be verified during feed-forward and back-propagation stages of the neural network all over the (n) dependencies.

As shown by Fig 24, scoring the positive polarities allows a transparent connectivity as well as inferring new connections.

DNN construction.

A flexible manner to implement the above steps is to proceed a DNN coding from scratch. With respect to the structure depicted by Fig 12, it has been chosen to use the “Tanh” activation function on the two hidden layers which approximate the sentiment polarity [–1, 1], the output layer has been activated by the “Sigmoid” function which scales the polar vector resulting from hidden layers into positive or negative sentiments, Where: (7) (8)

Table 4 details the parameters of the DNN structure depicted by Fig 12, the batch size of each hidden layer, the activation functions, the optimizer, and the estimated learning rate of each layer.

Table 4. Inner structure parameters of the proposed DNN compared to basic techniques.

As presented by Table 4, the model’s learning increases from thee hidden layers (0.027 to ≈ 0.9) by the output layer, which confirms the hypothesis of the chain rule (Fig 12) (i.e., most of learning happens at the decisional and particularly the output level.). The ReLu activation function has been activating the input layer as it provides better approximation for the embedded features vector, where no classification has made yet except for the frequency analysis (#1 in Table 4), Tanh function has best approximation for sentiment polarity (more detailed on section 6, “DNN construction”). Sigmoid has been activating the output layer to infer positive and negative instances.

As mentioned by Fig 25 and by displaying the model training history (Fig 26), it has been shown a rapid convergence to a stable accuracy of ≈ 83% which provides an answer on how to stop the model’s vanishing while it keeps propagating even if it reaches an optimal performance.

Fig 25. Impact of attention scores and embeddings on the model convergence.

Table 5 matches the meta-parameters involved within this study with their meaning regarding the studying domain.

Table 5. Meaning of the learning metrics’ parameters with regards to the SA study.

  • Accuracy is the proportion of true results among all the observed population: Acc =
  • F-measure is the mean between precision and recall: F-measure =
  • Precision is the proportion of true instances positively predicted among the true positive and false positive identified ones. Precision =
  • Recall reports the positive polar samples correctly predicted to the total positive samples. It reflects the model’s ability to infer positive samples. Recall =

The following Table 6 reports the sentiment classification metrics used in this work and the obtained values. We highlight within the same table the impact of the proposed techniques one by one on the model’s performance.

Table 6. Proposed model performance (shown with bold) compared to different techniques on health news tweets dataset.

Due to the features’ opacity, a naive Multi-layer DNN shows a low accuracy (67%) and a poor inference of true instances positively predicted (e.g., 51% precision). However, applying the same technique after excluding the nonrelevant features after graph embeddings (ISE in Table 6) has improved the model’s accuracy as well as the precision, but the recall’s rate remains stable. This is explained by the conditional step (see 2nd part of algorithm above, line 11) where the latter only considered the positive sentiments while the recall implies the positive instances among all population including the negative ones. By coupling the previous step with the scoring technique (a detailed explanation is given in “Improving DNN performance via a deterministic backward walk”), the model has seen a significant improvement among all metrics, that is justified by the determinism gained from selecting relevant features during backpropagation, because this selection covers the activation functions’ derivatives, both positive and negative instances have been covered, thing that explains the recall improvement (from 53% to 89.5%) as well as the other metrics., which answers the second part of “Research questions” (RQ5).

Complexity analysis

Time complexity.

The following formula: calculates the overall asymptotic complexity (TC) of a DNN. By considering a given threshold (h), a feed-forward propagation is limited to the input space embeddings times the cost of the activation functions. In our case there are two hidden layers activated with (tanh) and (sigmoid) functions respectively. Suppose:

TC(tanh) = O(t) and TC(sigmoid) = O(s), because (tanh) has bigger approximation: O(t) > O(s)

graph embeddings complexity is O(|V|), V is the total graph nodes, therefore:

For back-propagation, the time complexity is reduced to the scoring method which has (h) as a limit, therefore: O(score) = O(Vh+Eh), from that:

TC = O(Vh+Eh)+O(|V|)⋅O(t) which may be reduced to O(|V|)⋅O(t) in the worst case. The latter reflects the node embeddings strategy adopted by the proposed method.

Space complexity.

Instead of storing the matrices [94] of feature vectors and parameter weights in memory during the execution of the DNN model, the embedded graph entities are mainly supposed to allocate the memory with the activation function traces. At a time instance epoch(i), (i = 1…90) the proposed model history (e.g., Fig 26) allows to record the following metrics summarized in Table 7.

Table 7. CPU occupancy and learning metrics for the proposed model.

The cache hierarchy of the CPU enables to record several training batches of the proposed DNN (see Table 7). The execution flow shows a reduced footprint (i.e., 3.0 CPU occupancy) resulted from the graph embeddings followed by the backward scoring (see the below section). The reduced instruction vector may represent an alternative to the indeterministic sparsity solution [95] for an efficient DNN training.

As it is shown from Fig 27, the CPU experiences a batch of training and most of its time on the first model’s layers (hidden layers from Fig 27), with an average CPU time of 67.6% in first hidden layer to 49.09% in second one, it ends with less CPU occupation with an average of 26.7% on the decision (output) layer. That justifies our hypothesis about the repetitive work in the input unit of a DNN. However, the model’s accuracy is shown to perform reasonably well since earlier neurones, that’s due to the selection strategy which prevents features’ sparsity and overfitting.

Fig 27. Average CPU time and model’s efficiency through each layer.


By this section, the impact of the proposed learning method will be emphasized through different stages: training, learning, complexity and validation.

Due to the heterogeneity of the 16 news’ stations and features’ sparsity imposed to the generated graph components (i.e., nodes are only identified by their labels), the preliminary tests (Fig 28) show a low model performance even if it does not overfit after embedding the input space, the low accuracy remains an issue if not improved, because DNNs are known to perform well with huge data corpus.

Fig 28. Stability of proposed DNN after 90 epochs and 10 batches.

Although the loss has been significantly minimized (Fig 29(B)), the instability remarked within the accuracy (Fig 29(A)) variations remains a bottleneck towards the model adaptability.

Fig 29.

(a) Instability of accuracy. (b) Loss minimization.

Improving DNN performance via a deterministic backward walk

As shown by Figs 25 and 30, scoring the learning path which is recognized while training the DNN model became a mandatory step in our case study, in order to improve the whole accuracy. This will represent a typical example of a good trade-off transparency (graph transparency) and efficiency (DNN performance).

Transparency and learning performance.

The restriction imposed to the input nodes allowed a level of transparency regarding the predictive study, this has been replicated on the feed-forward path, where as described by Figs 3133, if we consider positive sentiments (polarity) as “blue” instances and the negative ones as “red” ones, the decision boundary showed a better separation of both polarities. However, best adjustment is shown by Fig 33 after scoring the back-propagation path (stamping positive polarity as a constraint).

Fig 33. Impact of edge embedding and path scoring on the decision threshold.

Consequently, results on adjusting the learning curve with both embeddings and scoring methods sequentially with respect to training scores (batch gradient descent) are illustrated by Fig 34.

Fig 34. Learning improvements with embedding then scoring techniques.

The Receiver-Operating-Characteristic (ROC) and Area-Under-the-curve (AUC) are two relevant metrics for models’ confidence especially in healthcare domain [96], those two metrics allow to visualize the trade-off between the model’s sensitivity and specificity, where:

  • Sensitivity = true-positive rate (rate of correctly identified sentiments)
  • Specificity = 1 –false-positive rate (rate of incorrectly identified sentiments)

as illustrated by Fig 35, the proposed learning model showed a higher AUC of 94% with 90% maximization of correctly identified sentiments.

Comparing to other methods.

As a part of the evaluation, the proposed model is compared to several computational frameworks related to healthcare domain which aimed to analyse tweets and extract sentiment polarity following specific topics. SA was the most targeted topic [97] among the other related domains. However, this process is still not disclosed, and the feature extraction mechanism for sentiment clustering is still not well defined. As depicted by Table 8, common works which have addressed twitter health news dataset used machine learning techniques for sentiments’ classification. However, as argued in the next section, a deep investigation of SA requires different approximations which go beyond linear ML models.

Table 8. Comparison of the proposed method (shown with bold) performance with other approaches on twitter health dataset.

Our proposed method shows great outcomes comparing with other techniques (Table 7), this could be emphasized with the following aspects:

Semantic enrichment: our proposed DNN covers both sentiments within separate tweets as well as the whole text corpus for an overall polarity [–1, 1] and subjectivity [0, 1], this includes most frequent terms.

Complexity: a complexity analysis has been explicitly conducted, the asymptotic results follow the abstraction strategy (Fig 12) by restricting the whole model complexity to the embedded nodes times the complexity of the decisional function (Tanh). That performance is much better than considering all input space for instance [99].

Efficiency/determinism: Although SVM has proven its robustness and performance in many SA tasks (see Table 2), its combination with LSTM represents a bottleneck towards a boosted performance. This could be justified by the pre-training and dependency cost of LSTM at the input data [100]. However, our proposed backpropagation selective strategy increases the model’s determinism (i.e., rapid surge of the learning rate (Fig 34)).

Transparency: Our model is characterised by a transparent prediction generation process, this includes the earlier conceptual stages (i.e., Figs 12 and 13) followed by a visual data distribution and the impact of the proposed techniques on best adjusting the decision boundary for sentiment classification (Figs 31, 32 and 33). As opposite to the classical classifiers [102], the proposed DNN structure allows different approximations of the problem (i.e., polarity, subjectivity, frequency, etc), that enables a global observation of the SA over all the news’ stations. The compliance of the backward selection method with backpropagation algorithm (see: “Features’ selection via attention scoring”, “Improving DNN performance via a deterministic backward walk”) does not require any additional training examples or hidden layers as the case in [103], which allowed the model complexity to be restricted to the embedded space.


Models on explainable AI

  • Although DARPA’s user interface [74] has been built around users’ expertise and their cognition ability, it disguises the traceable aspect of the prediction making, which may include the active neurons and the prediction path.
  • Instead of explaining learning models after their realization, current trends in machine learning [104] suggest that it is more prominent to include explicability from the first conceptual steps of the model. However, as illustrated by Fig 36, the non-linear distribution which results from distinctive feature scales (e.g., Frequency [0…n], subjectivity [0…1], etc.) requires an alternative method than traditional nonlinear ML approximation, where the latter is applied to the whole observations. A DNN could approximate each feature observation following specific layers, that what explains a higher sensitivity and recall performance (Table 8).
  • LSTM can only relate a given aspect to the previous one. But within the SA context, further dependencies may occur and need to be captured. For instance, in [100] (see Table 8) an index had to be done in order to boost the model performance.
  • A good understanding of the input dataset could be achieved by an efficient pre-processing. However, with DNNs, this does not guarantee a good performance, as the latter (see 21) is usually conditioned by a random weight assignment to activate certain functions. By the proposed model, we aim to make this process more deterministic.
  • Data is usually pre-processed before trained and validated by a DL model, that helps removing impurities like stop words, insignificance, etc., but eventually promote the loss of data information centrality. Whereas, by investigating a graph theory (i.e., embeddings) accompanied with a DNN data closeness centrality is preserved (Fig 23).
Fig 36. Binary sentiment polarity distribution of tweets.


  • Although the proposed model showed great convergence which prevents vanishing problem and saves training time, its performance was relatively weak when deployed on x86 architecture with 5 GB available RAM (Fig 28).
  • The embedding method prevents the DNN to broad the learning scale because the layers are activated by proceeding the embedded vector although the model backpropagates through all the instances (see Algorithm above) even though the loss measure is considerably less (Fig 29(B)), it mainly optimizes the scored weights (e.g., positive weights).
  • Disclosing features semantics in [99] has proven its resiliency in handling unstructured data. In our model, the embedded feature vector as well as the scored samples could be enriched by an accompanied context vector for understandability purposes.

Conclusion and future work

In this research work, we aim to propose a transparent DNN model for a sentiment classifier. It has been decided to proceed the development without using built-in DL libraries except for evaluation metrics invocation, and that was in order to exactly design each unit: input, decision and output with the defined method (see “Methods”). The latter consists of a new performance improvement strategy which combines a sparse graph embedding (i.e., node, edges with no features) and scoring paths for the input and decisional units respectively. The model is trained and tested on Twitter health news dataset, where a sentiment predictive analysis has been applied to each news sources based on the most frequented tweets. We broad the feature space by normalizing both token aspects and tweets for each of the 16 news so that a global sentiment polarity is inferred. Results show state-of-the-art performance while comparing to other models (see “Predictive analysis” and “Comparing to other methods”). Moreover, the transparency and the efficiency of the model in stabilizing the learning curve with better binary classification of tweets (see above).

This work can benefit from several improvements in the future. For instance:

  • Exploring the transferable learning aspect of graph embeddings to include other updated topics on twitter (e.g., Covid-19) where more transparency is required. This may be achieved by moving from the transductive to the inductive learning. Furthermore, that may provide an answer to the dynamic aspect of graphs as the input data may evolve over the time.
  • Proving the model resiliency against new unstructured and semi-structured data (SemEval-2014 task7 [105]).
  • In terms of performance, it has been proven that the embedding technique had a big impact on the model accuracy (see “Evaluation”). Thus, by considering a context features’ vector while training the model, this could broad the learning stage and improve the model performance.


  1. 1. Chen L-C, Lee C-M, Chen M-Y. (2019). “Exploration of social media for sentiment analysis using deep learning”. Soft Comput 24, 8187–8197 (2020). Accessed on 14/01/2020 10:17.
  2. 2. Masud M, Muhammad G, Alhumyani H, Alshamrani SS, Cheikhrouhou O, Ibrahim S, et al. “Deep learning-based intelligent face recognition in IoT-cloud environment”. Computer Communication. (2020). Volume 152, 15, pp. 215–222, Accessed on 17/05/2019 12:04.
  3. 3. Vyas V, Uma V. “Approaches to sentiment analysis on product reviews”. Sentiment analysis and knowledge discovery in contemporary business. IGI Global, Pennsylvania, pp 15–30.
  4. 4. Rodrigues Chagas BN, Nogueira Viana JA, Reinhold O, Lobato F, Jacob AFL, Alt R. “Current applications of machine learning techniques in CRM: a literature review and practical implications”. IEEE/WIC/ACM Int Conf Web Intell (WI) 2018:452–458. Accessed on 05/07/2021 22:45.
  5. 5. Rambocas M, Pacheco BG. “Online sentiment analysis in marketing research: a review”. 2020. J Res Interact Mark 12(2):146–63.
  6. 6. Rios N, de Mendonca Neto MG, Spinola RO. “A tertiary study on technical debt: types, management strategies, research trends, and base information for practitioners”. Inf Softw Techno 102:117–145.
  7. 7. Amara S, Subramanian RR. “Collaborating personalized recommender system and content-based recommender system using TextCorpus”. 2020 6th International Conference on Advanced Computing and Communication System (ICACCS), Coimbatore, India, 2020, pp 105–109.
  8. 8. Rozanska A, Podpora M. “Multimodal sentiment analysis applied to interaction between patients and humanoid robot Pepper”. IFAC-PapersOnline, 2019. Accessed on 22/07/2021 21:15.
  9. 9. PRXJJU. “Artificial Intelligence in Space Exploration”. Analytics Vidhya. 2021. pmid:34124465
  10. 10. Vora S, Mehta RG. “Investigating People’s Sentiment from Twitter Data for Smart Cities: A Survey”. International Journal of Computational Intelligence & IoT, vol 2, No 2. 2019.
  11. 11. Asif M, Ishtiaq A, Ahmad H, Aljuaid H, Shah J. “Sentiment analysis of extremism in social media from textual information”. Telematics Informatics 48. 2020. 1013445.
  12. 12. Hassan Saif, Miriam Fernandez & Harith Alani. “Evaluation Dataset for Twitter Sentiment Analysis”. 2013. A survey and a new dataset, the STS-Gold. CEUR Workshop Proceedings. 1096.
  13. 13. Cunliffe E, Curini L. “ISIS and heritage destruction: A sentiment analysis”. Antiquity, 92(364), 1094–1111. 2018.
  14. 14. Matalon Y, Magdaci O, Almozlino A, et al. “Using sentiment analysis to predict opinion inversion in tweets of political communication”. 2021. Sci Rep 11, 7250. pmid:33790339
  15. 15. Elbattah M, Arnaud E, Gignon M, Dequen G. “The Role of Text Analytics in Healthcare: A Review of Recent Developments and Applications”. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems Technologies (BIOSTEC 2021).
  16. 16. Clark EM, James T, Jones CA, Alapati A, Ukandu P, Danforth CM, et al. “A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter”. 2018. arXiv:1805.09959v1 [cs.CL]. Accessed on 29/06/2021 13:25.
  17. 17. Gu Y, Celli F, Steinberger J, Anderson AJ, Poesio M, Strapparava C, et al. “Using Brain Data for Sentiment Analysis”. JLCL 2014 Band 29(1)– 79–94.
  18. 18. Ahmad M, Aftab S, Bashir MS, Hameed N. “Sentiment Analysis using SVM: A Systematic Literature Review”. (IJACSA) International Journal of Advanced Computer Science and Applications, vol 9, No 2. 2018.
  19. 19. Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. “Text classification algorithms: A survey”. Information (2019), 10, 150.
  20. 20. Mike T. “Gender bias in machine learning for sentiment analysis”. Online Information Review; Bradford, (2018). Vol 42, N° 3. pp-343–354.
  21. 21. Ashokkumar P, Siva Shankar G, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, and Thippa Reddy Gadekallu. 2021. “A Two-stage Text Feature Selection Algorithm for Improving Text Classification”. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 3, Article 49 (April 2021), 19 pages.
  22. 22. Shankar GS, Ashokkumar P, Vinayakumar R, Ghosh U, Mansoor W, Alnumay WS. "An Embedded-Based Weighted Feature Selection Algorithm for Classifying Web Document", Wireless Communications and Mobile Computing, vol. 2020, Article ID 8879054, 10 pages, 2020. Accessed on 25/06/2021 23:25. pmid:33088230
  23. 23. Haque TU, Saber NN, Shah FM. “Sentiment analysis on large scale Amazon on product reviews”. In 2018 IEEE International Conference on Innovative Research and Development (ICIRD), (2018). pp 1–6.
  24. 24. Siemonsma PC, Blom JW, Hofstetter H, van Hespen ATH, Gussekloo J, Drewes YM, et al. (2018). The effectiveness of functional task exercise and physical therapy as prevention of functional decline in community dwelling older people with complex health problems”. BMC Geriatr 18, 164. pmid:30016948
  25. 25. Abualigah L, Alfar H, Shehab M, Abu Hussein AM. “Sentiment Analysis in Healthcare: A Brief Review. In book: Recent Advances in NLP: The Case of Arabic Language. (2020).
  26. 26. Yang K, Zhu J, Guo X. "POI neural-rec model via graph embedding representation". In Tsinghua Science and Technology, (2021ª). Vol 26, no 2, pp 208–218,
  27. 27. Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, et al. “Graph embedding on biomedical networks: methods, applications and evaluations”. Bioinformatics, Volume 36, Issue 4, 15 February 2020, pp 1241–1251, pmid:31584634
  28. 28. Yang J, Zou X, Zhang W, Han H. “Microblog sentiment analysis via embedding social contexts into an attentive LSTM”. Engineering Applications of Artificial Intelligence. (2021). Vol 97, 104048.
  29. 29. Bijar K, Zare H, Veisi H, Kebriaei E. “Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications”. Expert Systems with Applications. (2019). Volume 144,
  30. 30. Ivakhnenko AG, Lapa VG. (1965). Cybernetic Predicting Devices. CCM Information Corporation. New York: CCM Information Corp. pmid:14345299
  31. 31. Minsky M. (1963). “Steps toward artificial intelligence””. Computers and thought, McGraw-Hill, New York, pp 406–450. pmid:14086791
  32. 32. Alazab M, Khan S, Rama Krishnan SS, Pham Q-V, Kumar Reddy MP, Reddy Gadekallu TR. “A Multidirectional LSTM Model for Predicting the Stability of a Smart Grid”. Vol 8, 2020. Accessed on 26/06/2021 07:45.
  33. 33. Lillicrap TP, Santoro A. “Backpropagation through time and the brain. Current Opinion in Neurobiology”. (2019). Vol 55, pp 82–89. pmid:30851654
  34. 34. Guo Y, Chen J, Du Q, V-D Hengel A, Shi Q, Tan M. “Multi-way backpropagation for training compact deep neural networks”. Neural Networks. Volume 126, June 2020, pp 250–261. pmid:32272429
  35. 35. Smith RC. “It’s Time to View Severe Medically Unexplained Symptoms as Red-Flag Symptoms of Depression and Anxiety”. JAMA Netw Open. (2020). 3(7):e2011520. pmid:32701154
  36. 36. Huang W, Rao G, Feng Z, Cong Q. “LSTM with sentence representations for document-level sentiment classification”. Neurocomputing, (2018). 308: 49.
  37. 37. Violos J, Tserpes K, Psomakelis E, Psychas K, Varvarigou TA. (2016). “Sentiment analysis using word-graphs”. In WIMS, p_ 22.
  38. 38. Zhao P, Hou L, Wu O. “Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification”. Knowledge-Based Systems. Volume 193, 105443.
  39. 39. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. “Graph Attention Networks”. Machine Learning (stat.ML). 2017. ArXiv:1710.10903 [stat.ML].
  40. 40. Shmidhuber J. “Deep Learning in Neural Networks: An Overview”. Technical Report IDSIA-03-14/; 2014, arXiv:1404.7828 v3 [cs.NE].
  41. 41. Singh A, Sengupta S, Lakshminarayanan V. “Explainable deep learning models in medical image analysis”. 2020; arXiv:2005.13799v1 [cs.CV]. pmid:34460598
  42. 42. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. “A deep learning framework for neuroscience”. Nat Neurosci. 2019 Nov;22(11):1761–1770. Epub 2019 Oct 28. pmid:31659335; PMCID: PMC7115933.
  43. 43. Sun J, Binder A. "Generalized PatternAttribution for Neural Networks with Sigmoid Activations". International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019; pp 1–9,
  44. 44. Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov RR. “On multiplicative integration with recurrent neural networks”. In Advances in Neural Information Processing Systems, 2016; pp 2856–2864.
  45. 45. Kumar S, Sharma A, Tsunoda T. “Brain wave classification using long short-term memory network based OPTICAL predictor”. Sci Rep 9, 9153. 2019; pmid:31235800
  46. 46. Li B, Cheng Z, Xu Z, Ye W, Lukasiewicz T, Zhang S. “Long text analysis using sliced recurrent neural networks with breaking point information enrichment”. In: Proceedings of the 2019 IEEE international conference on acoustics, speech and signal processing, ICASSP 2019. Vol 124, pp 51–60.
  47. 47. Liu YH, Smith S, Mihalas S, Shea_Brown E, Sümbül Y. “A solution to temporal credit assignment using cell-type-specific modulatory signals”. BioRxiv. 2020;
  48. 48. Weber T, Heess N, Buesing L, Silver D. “CREDIT ASSIGNMENT TECHNIQUES IN STOCHASTIC COMPUTATION GRAPHS”. 2019; arXiv:1901.01761v1 [cs.LG].
  49. 49. Goyal A, Ke NR, Lamb A, Hjelm RD, Pal C, Pineau J, et al. “ACTUAL: ACTOR-CRITIC UNDER ADVERSARIAL LEARNING”. 2017, arXiv:1711.04755v1 [stat.ML].
  50. 50. Graziotin MD, Kuutila M. “The evolution of sentiment analysis—A review of research topics, venues, and top cited papers”. Computer Science Review, (2018), Vol27, pp 16–32, ISSN 1574-0137,
  51. 51. Torres AD, Yan H, Aboutalebi AH, Das A, Duan L, Rad P. “Chapter 3—Patient Facial Emotion Recognition and Sentiment Analysis Using Secure Cloud With Hardware Acceleration”. Intelligent data -Centric systems. 2018. pp 61–89,
  52. 52. Zunic A, Corcoran P, & Spasic I. “Sentiment Analysis in Health and Well-Being: Systematic Review”. JMIR medical informatics, 2020, 8(1), e16023. pmid:32012057
  53. 53. Aravantino V, Diehl F. “Traceability of Deep Neural Networks. Machine Learning (cs.LG)”. (2018). arXiv:1812.06744[cs.LG].
  54. 54. Yin Y, Song Y, Zhang M. “Document-level multi-aspect sentiment classification as machine comprehension”. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2044–2054,
  55. 55. Huang Y, Jin W, Yu Z, Li B. “Supervised feature selection through Deep Neural Networks with pairwise connected structure”. Knowledge-Based Systems, 2020, Vol 204, 106202,
  56. 56. Kraus M, Feuerriegel S. “Sentiment analysis based on rhetorical structure theory: learning deep neural networks from discourse trees”. Expert Syst Appl, (2019), 118:65–79.
  57. 57. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. “Learning word vectors for sentiment analysis”. In: Proceedings of 49th annual meeting of the Association for Computational Linguistics: Human Language and Technology, 2011. pp 142–150.
  58. 58. Arulmurugan R, Sabarmathi KR, Anandakumar H. “Classification of sentence level sentiment analysis using cloud machine learning techniques”. Cluster Comput 22, 1199–1209. 2019,
  59. 59. Song M, Park H, Shin K-s. “Attention-based long short-term memory network using sentiment lexicon embedding for aspect-level sentiment analysis in Korean”. Information Processing and EManagement, 2019, Vol 56, Issue 3, pp 637–653.
  60. 60. Re Z, Zeng G, Chen L, Zhang Q, Zhang C, Pan D. "A Lexicon-Enhanced Attention Network for Aspect-Level Sentiment Analysis," in IEEE Access, 2020, vol. 8, pp 93464–93471,
  61. 61. You Q, Cao L, Jin H, Luo J, “Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks,” in Proc. ACM Multimedia, 2016, pp. 1008–1017.
  62. 62. Chen F, Ji R, Su J, Cao D, Gao Y. “Predicting microblog sentiments via weakly supervised multimodal deep learning”. IEEE Trans Multimed. 2018, 20(4): 997–1007.
  63. 63. Deng J, et al. (2009). “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn, pp 248–255.
  64. 64. Xue W, Zhou W, Li T, Wang Q. “MTNA: A neural multi-task model for aspect category classification and aspect term extraction on restaurant review”. Proceedings of the Eighth International Joint Conference on Natural Language Processing. 2017, (Volume 2: Short Papers), 2, pp 151–156.
  65. 65. Agarwal A, Yadav A, Vishwakarma DK. “Multimodal sentiment analysis via RNN variants”. In IEEE international conference on big data, cloud computing, data science and engineering (BCD), 2019, pp 19–23.
  66. 66. Zadeh A, Zellers R, Pincus E, Morency L. "MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos", IEEE Intell, Syst, 2016.
  67. 67. Pandeya YR, Lee J. “Deep learning-based late fusion of multimodal information for emotion classification of music video”. Multimedia Tools and Applications. 2021, 80 (2), pp 2887–2905.
  68. 68. El-Affendi M, Alrajhi K, Hussain A. “A Novel Deep Learning-Based Multilevel Parallel Attention Neural (MPAN) Model for Multidomain Arabic Sentiment Analysis”, in IEEE Access, vol. 9, pp 7508–7518, 2021.
  69. 69. WANG X, WU P, LIU G, HUANG Q, HU X, XU H. “Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environments”. Computing, Archives for Informatics and Numerical Computation, 2019, 101 (6), pp 587–604.
  70. 70. Yang F, Zhang W, Tao L, Ma J. “Transfer Learning Strategies for Deep Learning-based PHM Algorithms”. Appl. Sci. 2020, 10, 2361, 2020;
  71. 71. Seo D, Oh K, Oh I. "Regional Multi-Scale Approach for Visually Pleasing Explanations of Deep Neural Networks," inIEEE Access, vol. 8, pp 8572–8582, 2020;
  72. 72. Kim B, Park J, Suh J. “Transparency and accountability in AI decision support: Explaining and visualizing convolutional neural networks for text information”. Decision Support Systems. Vol 134, 11330. 2020;
  73. 73. Yuan H, Tang J, Hu X, Ji S. “XGNN: Towards Model-Level Explanations of Graph Neural Networks”, 2020°, arXiv:2006.02587v1 [cs.LG].
  74. 74. She L, Chai JY. “Interactive Learning for Acquisition of Grounded Verb Semantics towards Human-Robot Communication”. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, vol. 1, 1634–44. Stroudsburg, PA: Association for Computation Linguistics.
  75. 75. Islam M, Anderson DT, Pinar AJ, Havens TC, Scott G, Keller JM. "Enabling Explainable Fusion in Deep Learning With Fuzzy Integral Neural Networks”. In IEEE Transactions on Fuzzy Systems. 2020; Vol 28, no 7, pp 1291–1300,
  76. 76. Kojima R, Ishida S, Ohta M., et al. kGCN: a graph-based deep learning framework for chemical structures”. J-Cheminform, 12, 32., 2020. pmid:33430993
  77. 77. Yang Z, Dong S. “Hierarchical Attention Graph Convolutional Network Incorporating Knowledge Graph for Explainable Recommendation. Knowledge-Based Systems 2020, Volume 204, 106194,
  78. 78. Selvaraju RR., Cogswell M, Das A, Vedantam R, Parikh D, Batra D. “Grad-CAM: Visual explanations from deep networks via gradient-based localization”. In Proceedings of the IEEE International Conference on Computer Vision, 2017; 618–626.
  79. 79. Zhu Q, Xu Y, Wang H, Zhang C, Han J, Yang C. “TRANSFER LEARNING OF GRAPH NEURAL NETWORKS WITH EGO-GRAPH INFORMATION MAXIMIZATION”, 2020. arXiv:2009.05204v1 [cs.LG].
  80. 80. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. "Lstm: A search space odyssey", IEEE transactions on neural networks and learning systems, 2017; vol 28,10, pp 2222–2232. pmid:27411231
  81. 81. Ahn H, Yim C. “Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning”. Appl. Sci. 10, 1959. 2020;
  82. 82. Zhu J, Meng Q, Chen W, Ma Z. “Interpreting Basis Path Set in Neural Networks”, 2020,
  83. 83. Yuan H, Ji S. “StructPool: Structured Graph Pooling via Conditional Random Fields”. In-international Conference on Learning Representations. (2020ª). Available from
  84. 84. Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, et al. Predicting drug-disease associations by using similarity constrained matrix factorization”. BMC Bioinformatics 19, 233, 2018. pmid:29914348
  85. 85. Hamilton WL, Ying R, Leskovec J. “Inductive Representation Learning on Large Graphs”. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
  86. 86. Zhou Z, Li X, N. Zare R. “Optimizing Chemical Reactions with Deep Reinforcement Learning”. ACS Cent. Sci. 2017, 3, 1337−1344. pmid:29296675
  87. 87. Hatami N, Gavet Y, Debayle J. “Classification of Time-Series Images Using Convolutional Neural Networks”. 2017. arXiv:1710.00886v2 [cs.CV]. 01/07/2021 21:32. pmid:28558002
  88. 88. Dube S. “High Dimensional Spaces. Deep Learning and Adversarial Examples”. 2018. arXiv:1801.00634v1 [cs.CV]. Accesses on 14/07/2021 16:21.
  89. 89. Lv-d Maaten, Hinton G. “Visualizing Data using t-SNE”. 2008. Journal of Machine Learning Research 9 (2008) 2579–2605. Accessed on 29/06/2021 23:56.
  90. 90. Karami A, Gnagopadhyay A, Zhou B, Kharrazi H. “Fuzzy approach topic discovery in health and medical corpora. International Journal of Fuzzy Systems”, 2017, pp 1–12.
  92. 92. Waheeb SA, Ahmed Khan N, Chen B, Shang X. “Machine Learning Based Sentiment Text Classification for Evaluating Treatment Quality of Discharge Summary”. Information. 2020; 11(5):281.
  93. 93. Arora R, Basu A, Mianjy P, Mukherjee A. “Understanding Deep Neural Networks with Rectified Linear Units”. ICLR 2028. arXiv:1611.01491v6 [cs.LG].
  94. 94. Dong X, Zhou L. “Deep network as memory space: complexity, generalization, disentangled representation and interpretability”. 2019, arXiv:1907.06572v1 [cs.LG]. Accessed on 21/06/5021 17:28.
  95. 95. Hoefler T, listarh D, Ben-Nun T, Dryden N, Peste A. “Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks”. 2021. arXiv:2102.00554v1 [cs.LG]. Accessed on 04/06/2021 14:25.
  96. 96. Namdar K, Haider MA, Khavati F. “A Modified AUC for Training Convolutional Neural Networks: Taking Confidence into Account”. 2020, ArXiv:2006.04836 [cs.LG].
  97. 97. Karami A, Lundy M, Webb F, Dwivedi YK. "Twitter and Research: A Systematic Literature Review Through Text Mining," in IEEE Access, 2020; vol. 8, pp 67698–67717.
  98. 98. Shaw G, Karami A. “Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise”. ASIS&T 2017, Washington, DC.
  99. 99. Karami A, Gangopadhyay A, Zhou B, Kharrazi H. “Fuzzy Approach Topic Discovery in Health and Medical Corpora”, 2017; arXiv:1705.00995v2 [stat.ML]. Accessed on 08/06/2021 09:47.
  100. 100. Jiang K, Feng S, Song Q, Calix RA, Gupta M, Bernard GN. Identifying tweets of personal health experience through word embedding and LSTM neural network”. BMC Bioinformatics 2018, 19, 210. Accessed on 12/06/2021 07:14. pmid:29897323
  101. 101. Kolajo T, Kolajo JO. “SENTIMENT ANALYSIS ON TWITTER HEALTH NEWS”. FUDMA Journal of Science (FJS). 2018, Vol. 2 No. 2, pp 14–20.
  102. 102. Cirqueira D, Almeida F, Cakir G, Jacob A, Lobato F, Bezbradica M, et al. “Explainable Sentiment Analysis Application for Social Media Crisis Management in Retail”. In Proceedins of the 4th International Conference on Computer-Human Interaction Research and Applications (CHIRA 2020), pp 319–328.
  103. 103. Chen H, Ji Y. “Improving the Explainability of Neural Sentiment Classifiers via Data Augmentation”. arXiv:1909.04225v4 [cs.CL]. Accessed on 02/06/2021 12:25.
  104. 104. Rudin C. “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead”. Nature Machine Intelligence, 1(5),206; 2019.
  105. 105. Pradhan S, Elhadad N Chapman W, Manandhar S, Savova G. (2014). SemEval-2014 Task 7: Analysis of Clinical Text”. Proceedings of the 8th International Workshop on Semantic Evaluation, 2014; pp 54–62.