Social support detection from social media texts

Zahra Ahani; Moein Shahiki Tash; Fazlourrahman Balouchzahi; Luis Ramos; Grigori Sidorov; Alexander Gelbukh; Rau´l Monroy

doi:10.1371/journal.pone.0337476

Abstract

Social support, conveyed through a multitude of interactions and platforms such as social media, plays a pivotal role in fostering a sense of belonging, aiding resilience in the face of challenges, and enhancing overall well-being. This paper introduces Social Support Detection (SSD) as a Natural Language Processing (NLP) task aimed at identifying supportive interactions within online communities. We define SSD through three subtasks: (1) binary classification of whether a comment expresses social support or not social support, (2) binary classification of the intended support target (individual or group), and (3) multiclass classification of the specific group being supported, including Nation, Other, LGBTQ, Black Community, Religion, and Women. We conducted experiments on a manually annotated dataset of 9,998 YouTube comments. Traditional machine learning models were employed using various combinations of linguistic, psycholinguistic, emotional, and sentiment-based features. Additionally, neural network-based models incorporating word embeddings were evaluated to enhance performance across the subtasks. The results indicate a prevalence of group-oriented support in online discourse, highlighting broader societal dynamics. The findings show that integrating psycholinguistic and affective features with unigram representations improves classification performance. The best macro F1-scores achieved across the subtasks range from 0.72 to 0.82.

Citation: Ahani Z, Shahiki Tash M, Balouchzahi F, Ramos L, Sidorov G, Gelbukh A, et al. (2026) Social support detection from social media texts. PLoS One 21(3): e0337476. https://doi.org/10.1371/journal.pone.0337476

Editor: Michael Flor, Educational Testing Service: ETS, UNITED STATES OF AMERICA

Received: October 30, 2024; Accepted: November 7, 2025; Published: March 25, 2026

Copyright: © 2026 Ahani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in this study cannot be shared publicly at this time because it will be used in an upcoming shared task being organized by our research team. Public release prior to the event would compromise the fairness of the competition. After the shared task has been conducted, the dataset will be available upon reasonable request from the research team at the following email address: nlp@cic.ipn.mx.

Funding: This work was partially supported by the Mexican Government through CONACYT (grant A1-S-47854 to AG) and Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico (grant 20254236 to GS and 20254341 to AG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Social support is the provision of behaviors, communication, and interactions that convey care and value to individuals, fostering a sense of belonging and aiding in coping with life’s challenges [1]. Social support manifests in diverse ways, ranging from expressions of care and encouragement to practical assistance or guidance. Recognizing the presence of supportive individuals who offer various forms of aid can serve as a buffer against stress and safeguard both emotional and physical health. The support patients receive from shared content plays a crucial role in enhancing their self-care practices and overall health results [2]. Specifically, individuals coping with chronic illnesses, disabilities, or cancer often find social media platforms invaluable, as they offer opportunities to connect with peers or professionals for guidance in managing their long-term conditions effectively [3].

Although previous research on hope speech detection provides useful methodologies, hope speech and social support are distinct concepts. Hope speech generally refers to speech that promotes optimism and counters hate speech [4], while social support encompasses broader communicative behaviors, including empathy, encouragement, guidance, and tangible assistance.

In this context, hope speech is not treated as a subtype of social support. Instead, both are considered complementary forms of prosocial communication, with different scopes and objectives. While hope speech often directly responds to hateful content, social support emphasizes providing comfort and help to those in need.

In recent years, a heightened awareness of the detrimental impacts of hate speech, abusive language, and misogyny on social media platforms has led to a surge in research efforts focused on their detection through Natural Language Processing (NLP) [5].

While social media platforms offer users the freedom and anonymity to express their opinions and engage in instant feedback, this liberty also fosters an environment where individuals may exploit the platform to propagate discriminatory or harmful views targeting specific demographics [6]. Consequently, developing tools and techniques for detecting and mitigating such content has become imperative in creating safer digital environments and promoting respectful online discourse.

However, some argue that this approach can infringe on users’ freedom of expression [6,7]. Instead of solely focusing on identifying and removing negative content, an alternative strategy could involve promoting positive interactions and supporting content that contributes to social good. By encouraging and amplifying constructive and respectful communication, social media platforms can foster a more positive online environment while still respecting users’ rights to freely express their opinions. This dual approach not only mitigates the spread of harmful content but also actively contributes to a more supportive and inclusive digital community [8].

Despite the importance of promoting positive and supportive content, not many tasks have been done in this research area. In response to these challenges, our proposed approach offers an alternative but under-explored strategy to combat the negative atmosphere on social media platforms by promoting social support comments. Rather than solely focusing on identifying and filtering out negative content, our approach seeks to cultivate a more positive and supportive online environment by encouraging users to provide emotional comfort, encouragement, and advice to those facing challenges.

Online social support encompasses the assistance and emotional comfort offered via digital platforms such as social media, forums, and messaging apps. This type of support is crucial for individuals and groups facing various challenges, such as victims of wars or individuals from historically marginalized communities who may experience social, economic, or political disadvantages. Through these digital channels, individuals can connect with others who share similar experiences, access valuable resources, and receive empathy and encouragement. The anonymity and accessibility of online support networks often make them a vital lifeline for those who might not have access to traditional forms of support. Additionally, these platforms can provide real-time assistance, foster a sense of community, and help reduce feelings of isolation and loneliness. A detailed definition and social support are presented in section.

Previous research has shown that users in online communities both seek and provide support while occupying emergent social roles, such as support seekers or providers.

These roles arise through interaction, and early behaviors can influence long-term engagement. Understanding these dynamics helps analyze how social support is exchanged and received in online environments [9,10].

This work contributes to the understanding of supportive communication in online environments, with potential implications for reducing stress, enhancing coping mechanisms, and fostering inclusivity in digital communities.

Inspired by tasks in hate speech detection [11,12], an opposite task, social support detection (SSD) from text, is proposed. This task is modeled as a three-step classification process: (1) identifying whether a comment expresses support or not, (2) categorizing the type of support (e.g., emotional, informational), and (3) detecting the target group or individual receiving the support.

Data was collected from YouTube by first identifying videos containing supportive content. After preprocessing, a final random subset of 9,998 comments was selected for manual annotation. This resulted in 2,236 supportive comments and 7,762 non-supportive comments. Following data preparation, experiments were conducted using traditional machine learning models, including Logistic Regression (LR), Support Vector Machine with radial basis function kernel (SVM(rbf)), Support Vector Machine with a linear kernel (SVM (linear)), Decision Trees (DT) and Random Forest Classifier (RFC). Three different feature sets were used: (1) LIWC + emotion + sentiment features — where LIWC (Linguistic Inquiry and Word Count) is a psycholinguistic tool that analyzes text by counting words in various psychologically relevant categories, combined with additional emotion and sentiment indicators; (2) TF-IDF-based unigrams — frequency-based vector representation of words; and (3) a combination of LIWC, emotion, sentiment features, and TF-IDF unigrams. Results showed that combining psycholinguistic, emotional, and sentiment features with TF-IDF yielded the best performance, achieving F1-macro scores of 0.78 and 0.80 in Tasks 1 and 2, respectively. Additionally, different word embeddings (GloVe and FastText) and model architectures (CNN and BiLSTM) were used to generate predictions and compare them with traditional models.

The main contributions of this study are listed below:

Study of social support for social good as a novel task in NLP,
Developing annotation guidelines and generating the first specific social support detection dataset in English,
Study of psycholinguistic features of social support for different levels: group, individual, and target groups,
Providing benchmark experiments using traditional machine learning models and linguistic and psycholinguistic features.

Definitions

Albrecht et al. [13] define social support as verbal and nonverbal communication between recipients and providers that reduces uncertainty about the situation, the self, the other, or the relationship, and functions to enhance the perception of personal control in one’s experience.

Although social support is helpful during stressful situations, [14] pointed out that the exchange of support does not only manifest during the crisis but is also an everyday occurrence in personal relationships.

The current research aligns with [14] definition of social support and views the exchange of comments and feedback between users and audiences as a form of social support occurring within their communication. Social support refers to”information leading the subject to believe that he is cared for and loved, esteemed, and a member of a network of mutual obligations” [15]. It is formed by the exchange of resources (i.e., verbal and nonverbal messages) between two or more individuals [16].

Studies have demonstrated that social support offers advantages to patients, such as dealing with challenging life circumstances [17], enhancing compliance with recommended treatment plans [18], and fostering better mental health [19]. In the management of chronic illnesses, social support plays a critical role in encouraging healthy behaviors and attaining favorable health results for patients. For instance, [18]. discovered that social support enhances individuals’ quality of life and diminishes psychological distress among those facing severe mental health issues.

Hence, social support is defined as”the emotional, informational, or practical assistance offered by others, including peers or community members”. This aid can be extended to individuals or groups, such as women, religious communities, or racial minorities like black community, aiding them in navigating challenges, enhancing their overall well-being, and fostering resilience.

Related work

Despite the critical importance of promoting positive and supportive content, research in this area remains relatively sparse and often serves as a counterpoint to hate speech. While there is no directly comparable work focused solely on positive content in NLP, related research, such as hope speech [20] offers some insights. Recent studies on hate speech have effectively used psycholinguistic features—particularly those from LIWC—to enhance detection. For instance, ElSherief et al. [21] and Silva et al. [22] demonstrated that LIWC-based emotional and cognitive cues improve classification performance. Mathew et al. [8] used LIWC to analyze counter speech (non-hateful replies to hate speech meant to refute or counter it), revealing its linguistic contrasts with hate speech. Building on this foundation, our work applies similar features to the underexplored domain of supportive communication, offering a complementary perspective.

The phenomenon of hate speech and violent communication online is commonly referred to as cyberhate [23]. It involves the use of electronic communication technologies to propagate discriminatory or extremist messages, targeting not only individuals but also entire communities [24]. Hate speech encompasses various linguistic styles and actions, including insults, provocation, and aggression [25]. It can be categorized into different types, such as gendered hate speech, which targets specific genders or promotes misogyny, religious hate speech, which discriminates against various religious groups, and racist hate speech, which involves racial discrimination and prejudice against particular ethnicities or regions [26].

The feasibility of utilizing domain-specific word embeddings as features and a bidirectional LSTM-based deep model as a classifier for the automatic detection of hate speech was studied by [27]. Three datasets were used, with a total collection comprising 21,514 non-hate and 27,085 hate instances. To ensure balanced data, 16,260 instances for each label were used. This approach facilitated the detection of coded language by appropriately ascribing negative connotations to words. Additionally, the applicability of the transformer-based transfer learning language model (BERT) to the hate speech classification task was investigated, given its high-performance results across various NLP tasks. Experimental findings indicated that the combination of domain-specific word embeddings with the bidirectional LSTM-based deep model achieved an F1 score of 93%, while BERT achieved an F1 score of 96% when applied to a combined balanced dataset sourced from existing hate speech datasets.

Balouchzahi et al. [7] introduced PolyHope, the first multiclass hope speech detection dataset in English. The dataset creation process involved the collection of approximately 100,000 English tweets, which were preprocessed to yield around 23,000 tweets. A random subset of 10,000 tweets was subsequently selected for annotation, resulting in final statistics of Hope = 4175 and Not-Hope = 4081 post-annotation. They further fine-grained the type of hope into General, Realistic, and Unrealistic hopes. To assess the dataset’s performance, various baseline models were evaluated using diverse learning approaches, including traditional machine learning, deep learning, and transformer-based methods. The top-performing models for each learning approach demonstrated the average macro F1 scores for both binary and multiclass classification tasks on the PolyHope dataset, with transformers achieving better results, scoring 0.85 for binary classification and 0.72 for multiclass classification.

Palakodety et al. [28] analyzed an unfolding international crisis using a substantial corpus of YouTube comments, consisting of 921,235 English comments posted by 392,460 users, drawn from a total of 2.04 million comments by 791,289 users across 2,890 videos. Three primary contributions were highlighted. Firstly, the effectiveness of polyglot word embeddings in revealing precise language clusters was emphasized, leading to the development of a document language identification technique requiring minimal annotation. Its applicability and usefulness across various datasets involving multiple low-resource languages were showcased. Secondly, temporal trends in pro-peace and pro-war sentiment were examined, noting that during periods of heightened tension between the two nations, pro-peace sentiment in the corpus reached its peak. Lastly, in the context of politically charged discussions during a volatile situation, the study explored the potential of automatically identifying user-generated web content that might contribute to reducing hostile discourse. While practical applications remained limited, the task of hope-speech detection was introduced as a step toward better understanding such dynamics, with the best performance reported using n-grams (F1 score: 78.51%).

Dataset development

Data collection and processing

This research focuses on analyzing data collected from YouTube comments across 15 videos spanning various categories such as national identity, Black Community, women, religion, LGBTQ + , and others. The videos were manually selected based on their topical relevance to themes likely to elicit social support or emotional discourse. These included socially or emotionally charged subjects such as the war between Israel and Palestine, events involving the Black Community, public reactions to Cristiano Ronaldo, LGBTQ+ issues, and women’s rights. The selection was guided by the prominence of these topics in public discourse, their potential to generate supportive interactions, and the availability of open, high-engagement comment sections. The inclusion of the Cristiano Ronaldo video was motivated by the significant public response following personal events in his life (e.g., family matters and mental health), which sparked an outpouring of supportive comments.

All comments were posted between April 12, 2016, and February 13, 2024. This range represents the oldest and newest timestamps in our dataset. The full list of selected video URLs is provided in S1 Appendix for transparency and reproducibility.

Initially, we amassed 66,272 comments. After filtering out duplicate and non-English entries (with duplicates defined as exact text matches across or within videos), the dataset was refined to 42,695 comments.

Since we lacked prior knowledge of how many comments were truly supportive, we implemented a keyword-based sampling strategy to enhance the likelihood of selecting supportive content. We then selected 5,000 comments containing predefined supportive keywords and an additional 5,000 comments randomly. The keywords included phrases such as “support,” “stay strong,” “I’m here to help,” “I believe in you,” “inspiring,” and related terms. These were chosen based on prior literature [29] and manual inspection for their semantic connection to support-related language. They were not intended to serve as definitive labels of support, but rather as heuristic tools to guide sampling. Importantly, these keywords were used to identify general expressions of support and were not designed to capture any specific category of social support, such as informational or emotional support, or support given versus sought.

The full list of supportive keywords used is provided in S2 Appendix.

Annotator selection

For annotator selection, three annotators were involved: two males and one female. Two of them (one male and one female) hold master’s degrees in computer science and possess proficient English language skills, while the third annotator, one of the authors of this paper, is a Ph.D. student in natural language processing with advanced English proficiency and a comprehensive understanding of the annotation schema and task objectives. Although the first two annotators were not native English speakers, both had completed their graduate education in English and had prior experience with academic reading and writing. This background ensured that all annotators were capable of understanding the nuanced meaning of online comments written in English. Initially, each annotator was provided with 100 sample comments and detailed annotation guidelines. These samples were used to help them familiarize themselves with the task. Individual meetings and interviews were then conducted to address any confusion and ensure a thorough understanding of the annotation procedure.

The full annotation set consisted of 9,998 comments. Each item was independently annotated by all three annotators, resulting in three annotations per comment. The comments were randomly shuffled and split into five batches of 2,000 items each for ease of assignment. Each annotator received all five batches over time and annotated every comment, with a 20-day deadline per batch. This setup allowed full triple-annotation coverage while maintaining manageable workload and scheduling.

The total annotation process spanned approximately 100 days, with each annotator annotating all 9,998 comments over five consecutive 20-day rounds.

Annotation guidelines

The SSD task was structured as a three-step classification process. First, supportive comments were identified. Next, it was determined whether these supportive comments were directed toward an individual, a group, or a community. Finally, if the supportive comment was identified as being directed toward a group, the specific group was further identified. The guidelines for this process are described below.

Subtask 1 – Binary social support detection: In this subtask, a given text is classified as either supportive or non-supportive:
1. – Social Support (label = SS): Supportive statements promote understanding, empathy, and positive action. Therefore, a supportive comment is a statement or message that offers support, encouragement, admiration, or assistance to individuals or groups that are encountering difficulties or have accomplished something noteworthy. These comments aim to provide emotional support, boost morale, or acknowledge the achievements of others.
2. – Not Social Support (label = NSS): The text does not convey any form of support, admiration, or encouragement.
Subtask 2 – Individual vs. Group: In this subtask, each pre-identified in Subtask 1 supportive comment is further classified as support for an individual or support for a group:
1. – Individual: If the text expresses support for a specific person or individual (e.g., Cristiano Ronaldo, Trump), it is labeled as Support for Individual.
2. – Group: If the text expresses support for a group of people, community, tribe, nation, etc. (e.g., Muslims, Real Madrid, Black nations, LGBTQ), it is labeled as Support for Group.
Subtask 3 – Multiclass SS for Groups: In this subtask, we aim to identify which community or group of people are targeted for social support by classifying the group supportive comments identified in Subtask 2 into the following categories:
1. – Women: The text expresses support for women and promotes women’s rights and feminism.
2. – Black community: The text expresses support for the black community and promotes black community rights.
3. – LGBTQ: The text expresses support for the LGBTQ community and promotes LGBTQ community rights.
4. – Religion: The text expresses support for a religion and its rights.
5. – Other: The text expresses support for a community other than those listed above.
6. – Nation: The text expresses support for a Nation and its rights.

Annotation procedure

Detailed annotation guidelines and sample data were provided to the three chosen annotators to facilitate the creation of the proposed dataset. The annotators followed a structured process, as illustrated in Fig 1. Initially, they determined whether comments expressed support, concern, or care. As a substep of this first decision, annotators also assessed whether the identified support promoted violence or negativity; comments that did so were excluded from the dataset to ensure that only constructive and non-violent expressions of support were retained. If the comment was supportive and non-violent, annotators proceeded to a second level of analysis, distinguishing whether the support was directed towards an individual or a group. In cases where it pertained to a group, annotators further specified the group’s affiliation, such as Nation, Religion, Black Community, Women, LGBTQ, or Other. Conversely, if the comment did not exhibit support, annotators labelled it as Non-supportive. This process aimed to support comprehensive annotation and maintain dataset quality.

Download:

Fig 1. Overview of annotation procedure.

https://doi.org/10.1371/journal.pone.0337476.g001

Inter-annotator agreement

Inter-annotator agreement (IAA) measures the consistency among annotators while accounting for chance agreement. Fleiss’ Kappa coefficients, suitable for multiple annotators, were calculated for all tasks, resulting in scores of 0.711 for Task 1 (Binary Support), 0.899 for Task 2 (Individual vs Group), and 0.886 for Task 3 (Targeted Group), indicating a generally high level of reliability.

All 9,998 comments were independently annotated by three annotators for Task 1. For Task 2 and Task 3, annotation was performed on subsets following the hierarchical annotation design: Task 2 annotations were applied only to comments labeled supportive in Task 1, and Task 3 annotations only to comments labeled as ‘Group’ in Task 2. This naturally led to varying sample sizes per task.

Disagreements were resolved by majority voting per task. When two annotators agreed and one disagreed on a label, the majority label was chosen. Fleiss’ Kappa was calculated only on the subset of comments annotated by all three annotators for each task to ensure valid agreement measurement.

Distinguishing between individual and group support (Task 2) was challenging, particularly when comments lacked explicit target mentions or contained ambiguous support references. Annotators were instructed to select the most salient or explicit target in cases of mixed support. In ambiguous cases, majority voting determined the final label.

Statistics of the dataset

Table 2 showcases the SSD dataset, which is organized into three hierarchical annotation levels with varying sample sizes. In Task 1, the number of Non-Social Support samples (7,762) greatly exceeds Social Support samples (2,236), indicating a higher prevalence of non-supportive comments. Task 2 shows more comments related to groups (1,813) than individuals (423); however, this distribution may reflect the specific nature of the selected videos rather than a broader tendency in user discussions. Task 3 reveals a wide distribution across specific categories, with Nation having the most samples (982) and Religion the fewest [20]. The higher number of comments about Nation can be attributed to the trending topic at the time of data collection, specifically the Israel-Palestine conflict. Other also has a significant number of samples (520) because it encompasses various additional categories within supportive comments.

Download:

Table 1. Examples of annotated comments.

https://doi.org/10.1371/journal.pone.0337476.t001

Download:

Table 2. Statistics of SSD dataset.

https://doi.org/10.1371/journal.pone.0337476.t002

The comments in the dataset vary widely in length, with a minimum of 3 characters, a maximum of 942 characters, and an average length of approximately 125 characters.

These differences likely stem from the inherent interest and sensitivity of the topics, the nature of the videos, and the sampling methods used. While this results in some class imbalance—particularly in Task 3—we retained these natural proportions to reflect the distribution of real-world discourse. To account for this imbalance during evaluation, we used macro-F1 scores, which weigh each class equally regardless of size. However, we acknowledge that some categories, especially those with very few instances (e.g., 24 or 19 cases), may not be large enough to support strong generalizations or robust statistical conclusions. Table 2 presents the statistics of the dataset. Table 1 further illustrates representative annotated examples from each task level to provide qualitative insight into the dataset.

Feature analysis

This section delves into the complex interplay of psycholinguistic, emotional [30], and sentiment features utilized in our research. It not only outlines these features but also explores how they relate to various forms of social support. By understanding these dynamics, we gain a nuanced understanding of how language and emotions intersect with mechanisms of social support. Psycholinguistic attributes involve linguistic cues intertwined with psychological processes, extracted using LIWC software from social supportive comments. Emotional features [31] refer to the expression of specific emotions (e.g., joy, sadness, anger), along with their intensity and valence within communication, shedding light on the emotional dynamics of supportive interactions. Additionally, sentiment features reveal the overall sentiment conveyed in the text—whether positive, negative, or neutral—offering insights into the prevailing tone and attitude in supportive discourse.

LIWC

The LIWC model has transformed psychological research by making language data analysis more robust, accessible, and scientifically rigorous. LIWC-22, the latest version, evaluates over 100 textual dimensions validated by esteemed research institutions worldwide. With over 20,000 scientific publications utilizing LIWC, it has become a widely recognized and trusted tool, enabling novel analytical approaches.

LIWC is a widely used computerized text analysis tool that enables researchers to examine the emotional, cognitive, structural, and process-related aspects of language. It works by performing frequency analysis of words in written text or speech [32]. The advantages of LIWC include its user-friendliness, ability to quantify complex psychological constructs, and strong empirical validation across various research fields. Recent studies have shown a growing trend in combining LIWC with machine learning techniques, especially for diagnosing mental disorders and investigating psychological traits. Such research often analyzes large volumes of text to identify relationships between everyday language use and characteristics like personality, social behavior, and cognitive patterns [32].

Despite these advantages, LIWC has limitations. One major issue is its reliance on predefined linguistic categories, which may not capture the nuances and variations of natural language [33,34]. Additionally, LIWC can struggle with accurately interpreting sarcasm, irony, and other subtle forms of language, potentially leading to misinterpretations.

In this study, we aim to employ machine learning and LIWC to detect social support.

We used a comprehensive set of LIWC-derived features covering multiple categories, including Summary Variables (overall language metrics such as word count), Linguistic Dimensions (e.g., pronouns, articles), and psychological dimensions such as Drives, Cognition, Affect, Social Processes, Culture, Lifestyle, Physical States, Motives, Perception, and aspects of Conversation. The average statistical relationships between these LIWC features and different categories of social support and non-supportive comments are detailed in Table 3.

Download:

Table 3. LIWC Feature Statistics for Social Support Analysis. All values (except Word Count) represent the average percentage of words in each category relative to total words per comment.

https://doi.org/10.1371/journal.pone.0337476.t003

The values presented in Table 3 represent the average percentages of words in each LIWC category relative to the total word count per comment, except for “Word Count,” which indicates the average number of words per comment. All other categories, such as “Function Words,” reflect the average percentage of words belonging to that category within the comments.

LIWC feature analysis

To aid interpretation, we briefly describe the main LIWC categories used in this analysis. LIWC (Linguistic Inquiry and Word Count) assigns words in texts to psychologically meaningful categories based on validated dictionaries. Each category reports the proportion of words in a text belonging to that dimension. The following are the relevant LIWC categories analyzed in this study:

Word Count (WC): Total number of words per comment.
Function Words: Articles, pronouns, prepositions, conjunctions, auxiliary verbs, and other structural words.
Drives: Motivational categories including achievement, affiliation, power, reward, and risk.
Cognition: Words related to thinking processes such as insight, causation, certainty, and discrepancy.
Affect: Emotion-related words, including positive and negative emotions, anxiety, anger, and sadness.
Social Processes: Words referring to human interactions such as family, friends, and communication.
Culture: Words related to politics, ethnicity, religion, and technology.
Lifestyle: Terms about work, school, and home life.
Physical: Health-related words including illness, wellness, mental health, and substances.
States: Words referring to temporary internal conditions like tiredness or hunger.
Motives: Words reflecting internal motivational states or goals.
Perception: Words about seeing, hearing, feeling, movement, space, and attention.
Conversation: Informal language features such as assent, fillers, disfluencies, and internet slang.

Word Count (WC) refers to the average number of words used per comment. In our dataset, supportive comments exhibit slightly higher word counts on average compared to non-supportive ones, indicating that supportive messages tend to be more elaborated.

Function Words in LIWC include pronouns, impersonal pronouns, articles, prepositions, auxiliary verbs, common adverbs, conjunctions, and negations. These words play a structural role in language and are often used to explore communication style and psychological state. In our dataset, the average proportion of function words is slightly lower in supportive comments than in non-supportive ones. Individual-targeted comments also show a lower proportion compared to group-targeted comments. Comments related to LGBTQ topics contain a relatively higher proportion of function words.

Drives in LIWC represent motivational dimensions such as affiliation, achievement, power, reward, and risk. Supportive comments show a slightly higher average proportion of Drive-related words compared to non-supportive ones. The difference between group- and individual-targeted comments is small, suggesting similar usage patterns across these categories. Within the group support subcategories, comments labeled as “Other” display a higher proportion of Drive-related language.

Cognition in LIWC includes words related to cognitive processes such as causation, insight, certainty, and discrepancy. In our dataset, non-supportive comments contain slightly higher percentages of cognition-related words than supportive ones. Group-targeted comments also show somewhat higher values than individual-targeted ones. Comments related to LGBTQ topics show elevated cognitive word use

Affect in LIWC captures emotion-related language, including both positive and negative emotions, as well as specific states like anxiety, anger, and sadness. Supportive comments show a higher average proportion of affective words than non-supportive ones. Individual-targeted comments also exhibit higher affective word use than group-targeted ones.

Social Processes covers words related to human interactions, such as family, friends, and communication. Supportive comments show a slightly higher average proportion of social process words than non-supportive comments. Individual comments tend to use more social process language compared to group comments.

Culture covers words related to politics, ethnicity, religion, and technology. Supportive comments tend to have a slightly higher proportion of cultural terms than non-supportive comments. Group discussions show higher usage of culture-related words than individual comments. The “religion” subgroup within group discussions shows a notably higher value.

Lifestyle includes words about work, school, home life, and employment. Supportive comments exhibit a higher average proportion of lifestyle-related words than non-supportive ones. Individual comments show slightly more engagement with lifestyle terms compared to group comments. The “religion” subgroup is elevated in this category.

Physical refers to health-related words including illness, wellness, mental health, and substances. Supportive comments show slightly higher usage of physical terms than non-supportive comments. Group comments tend to use more physical words than individual comments. The “religion” subgroup shows lower proportions.

States include words about temporary internal conditions such as tiredness or hunger. Non-supportive comments and group interactions show slightly higher average proportions of state-related words. The “Women” subgroup displays elevated values in this category.

Motives reflect internal motivational states or goals. Supportive comments, individual comments, and comments related to national identity show higher average proportions of motive-related words.

Perception includes words related to seeing, hearing, feeling, movement, space, and attention. Non-supportive comments, group comments, and those relating to Black individuals show higher average proportions of perception words.

Conversation includes informal language features such as assent, fillers, disfluencies, and internet slang. Non-supportive comments, individual comments, and LGBTQ-related comments exhibit higher average proportions of conversational markers.

All observations described above are descriptive and no statistical tests were performed to assess the significance of differences across comment categories.

Emotions

This study employed the NRC Emotion Lexicon (version 1.0) [35] to analyze emotional content in comments associated with different types of social support. The NRC Emotion Lexicon is a manually curated resource that maps English words to eight basic emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, and trust.

To compute emotion scores for each comment, we matched every word to the NRC Emotion Lexicon. For each emotion, we retrieved the corresponding emotion-intensity score for every matched word and summed these values across the comment. Then, for each support category (e.g., Social Support, Individual, Group, Nation, etc.), we calculated the average total emotion intensity per comment by averaging these summed scores across all comments within that class. Consequently, the resulting values in Table 4 represent average cumulative emotion intensities, which can exceed 1 because they reflect total emotion strength rather than normalized proportions by word count. These scores were then averaged across all comments within each support category.

Download:

Table 4. Emotion scores for social support subtasks.

https://doi.org/10.1371/journal.pone.0337476.t004

The emotion scores quantify the aggregate strength of words associated with each of the eight emotions within a comment, computed as the sum of lexicon-assigned emotion weights divided by the comment’s total word count (i.e., a weighted proportion); higher scores indicate greater presence or intensity of the respective emotion.

Table 4 shows the average emotion scores for different categories of comments, separated into supportive and non-supportive groups, as well as subgroups such as individual vs. group comments and demographic themes.

The data show that positive emotions such as joy and trust tend to have higher average scores in supportive comments compared to non-supportive ones across most categories. For example, joy is notably higher in individual and LGBTQ-related supportive comments. Negative emotions such as anger and fear display more nuanced patterns, with variations across categories but no clear overall trend. Emotional expression varies across demographic groups, reflecting the complex interplay of emotions and social dynamics in different contexts.

It is important to note that these emotion scores are descriptive and based on lexical matching; no statistical significance tests have been conducted to assess differences.

Therefore, these findings should be interpreted cautiously as preliminary insights into emotional patterns in supportive discourse.

Sentiment analysis

The Social Support dataset was analyzed for sentiment using the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool [36]. VADER is a rule-based sentiment analysis software specifically tuned for social media text. It uses a combination of a sentiment lexicon and heuristics to assign polarity scores to a given piece of text. Each text receives a compound score ranging from −1 (extremely negative) to +1 (extremely positive), along with proportions for negative, neutral, and positive sentiment.

To assign sentiment labels, we used VADER’s compound score: texts with scores above 0.05 were labeled as positive, below −0.05 as negative, and those in between as neutral.

Table 5 showcases the proportion of texts exhibiting negative, neutral, or positive sentiment within different categories. Across most categories, neutral sentiment appears to be the most prevalent, followed by either positive or negative sentiment, though the balance between the two varies. Non-supportive contexts generally exhibit slightly higher proportions of negative sentiment compared to supportive contexts, where neutral sentiment tends to dominate. Individual and group contexts show a relatively balanced distribution between neutral and positive sentiments. There are also noticeable differences in sentiment distribution across demographic groups, with variations particularly evident among the Black community and women, where negative sentiment appears more pronounced.

Download:

Table 5. Sentiment analysis scores for social support subtasks.

https://doi.org/10.1371/journal.pone.0337476.t005

Note: The numbers in Table 5 represent proportions (or percentages) of all statements for the relevant class.

These findings suggest that sentiment expression may be shaped by the social role of the speaker and demographic context. However, no statistical significance tests were conducted, so these observations should be interpreted as preliminary trends.

Experiments

This section presents our experimental setup for detecting social support in online comments. We evaluate both traditional machine learning and deep learning models to classify supportive content, identify support types, and determine the target (individual or group). The experiments are based on a manually annotated YouTube dataset and aim to establish strong baselines using various feature sets, including psycholinguistic, emotional, sentiment, and TF-IDF-based representations. By comparing model performance across different inputs and architectures, we assess the effectiveness of linguistic and neural approaches for this novel NLP task.

Traditional machine learning models

We utilize five traditional machine learning classifiers for detecting social support: Logistic Regression (LR), Support Vector Machine (SVM) with both radial basis function (RBF) and linear kernels, Decision Tree (DT), and Random Forest Classifier (RFC), all implemented using the scikit-learn library [37]. To enhance model robustness and potentially improve predictive performance, we explored ensemble methods based on both hard and soft voting strategies. Hard voting determines the final predicted class by majority vote, where each classifier casts one vote and the class with the most votes is selected. Soft voting, on the other hand, averages the predicted class probabilities from all classifiers and selects the class with the highest average probability, which can better capture classifier confidence. All five classifiers were included in the ensembles and combined using scikit-learn’s VotingClassifier with default parameters. The classifiers were trained using default hyperparameters to provide a fair baseline comparison rather than tuning individual models. The input features consist of psycholinguistic, emotional, and sentiment attributes extracted at the individual comment level, not aggregated across groups. Specifically, LIWC was used to compute numerical scores for each comment across categories such as function words, cognitive processes, affect, and social processes. Emotion features were obtained from the NRC Emotion Lexicon, which assigns scores for eight emotions (e.g., anger, joy, trust) based on detected emotion words in each comment. Sentiment features were generated using the VADER tool, providing negative, neutral, and positive sentiment probabilities per comment. These features form the input vectors for the classifiers. The aggregated feature statistics presented in Tables 3–5 serve only for descriptive analysis and are not directly used in model training. Additionally, pandas was used for data handling, scipy for sparse matrix operations, nltk for text preprocessing, and numpy for numerical computations throughout the experiments.

Preprocessing

Initially, the data preprocessing involved removing duplicate comments and selecting only English comments. For general text standardization, tokenization, lowercasing, punctuation removal, stopword elimination, and stemming or lemmatization were performed. Emojis and emoticons were converted into textual representations using the emotion library [35]. Abbreviations were expanded to their full forms using a predefined dictionary, and additional punctuation and stopwords were removed to refine the text.

However, different feature extraction tools required tailored preprocessing: for LIWC, VADER, and the NRC lexicon, stemming, lemmatization, and stopword removal were not applied, as these steps could reduce the accuracy of lexicon matching. Instead, analyses with these tools were performed on texts after lowercasing and minimal cleaning (e.g., emoji conversion), preserving word forms and stopwords. In contrast, TF-IDF and word embedding-based models utilized fully preprocessed texts, including stemming/lemmatization and stopword removal, to optimize vocabulary representation and reduce noise.

Feature extraction

We extracted multiple types of features to comprehensively represent the textual content of each comment for classification.

Psycholinguistic features: Using the Linguistic Inquiry and Word Count (LIWC) tool [38], we obtained numerical scores for categories such as function words, cognitive processes, affective states, social processes, culture, lifestyle, motives, perception, and more. Each category quantifies the prevalence of psychologically relevant word classes within a comment.

Emotional features: The NRC Emotion Lexicon [35] provided eight emotion intensity scores per comment, covering anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. These capture the emotional tone present in the text.

Sentiment features: VADER sentiment analysis [36] was used to compute normalized scores representing negative, neutral, and positive sentiment proportions for each comment. VADER is optimized for social media text and captures subtle sentiment nuances.

Lexical features: We extracted TF-IDF weighted word unigrams to capture the frequency and importance of lexical patterns in the dataset. These features were used as the primary textual representation for the machine learning classifiers.

Model training and predictions

In all experiments, we utilized a 5-fold cross-validation approach for both training and evaluating the ML models. Evaluation and comparison were conducted based on the average weighted and macro scores across all folds. Comprehensive results are elaborated upon in the Results section.

Deep learning

Two deep learning models, namely Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), were trained separately using Global Vectors for Word Representation (GloVe) and FastText embeddings. The models were implemented using the Keras library with TensorFlow backend [39]. A Keras tokenizer was fitted on the dataset texts to convert all texts into sequences. The maximum sequence length was set to the length of the longest comment, and all sequences were padded to this length. Vectors were obtained from the word embedding matrix for each comment, after which the input sequences were created and fed to the deep learning models. The hyperparameters used for both models are detailed in Table 6. Notably, both models share identical parameter settings as they were optimized for fair baseline comparison. Each model was trained for 50 epochs per fold using 5-fold cross-validation.

Download:

Table 6. Parameters for deep learning models.

https://doi.org/10.1371/journal.pone.0337476.t006

Results

The machine learning models were evaluated across three distinct classification steps, as presented in Table 2. Importantly, each level of our experiment was augmented with different feature combinations, namely LIWC+Emotions and sentiment features only, TF-IDF only, and a combination of all features. This approach allowed us to systematically investigate the impact of different feature sets on the performance of various models across different classification tasks. Through this comprehensive analysis, we aimed to identify the most effective model-feature combinations for accurate and reliable social support detection. We conducted experiments using CNN and BiLSTM models with GloVe and FastText embeddings across three subtasks, and present the results in Table 10.

Social support detection with LIWC, emotion, and sentiment features

Table 7 presents the classification results using psycholinguistic features extracted from LIWC, alongside emotional and sentiment-based features. These handcrafted features were used to train several traditional machine learning classifiers to evaluate their effectiveness across the three subtasks.

Download:

Table 7. Classification results using LIWC, emotions and sentiments features.

https://doi.org/10.1371/journal.pone.0337476.t007

In Subtask 1, the Logistic Regression (LR) model achieved the highest macro

F1-score of 0.7061, slightly outperforming other models such as SVM with linear and RBF kernels, Decision Tree (DT), and Random Forest Classifier (RFC). While ensemble models like Soft Voting and Hard Voting showed competitive performance, the margin of improvement was minimal. This trend was also observed in Subtask 2, where LR again obtained the highest macro F1-score of 0.7751. In Subtask 3, although overall performance decreased due to the increased complexity of the task, LR remained the top-performing model with a macro F1-score of 0.5666.

Interestingly, in all three subtasks, the performance differences between LR and other models such as SVM and RFC were relatively small, suggesting that the selected features offer a comparable level of informativeness across models. This indicates that Logistic Regression, a relatively simple yet interpretable model, is a strong candidate for social support classification when relying on LIWC, emotion, and sentiment features.

Social support detection using TF-IDF word unigrams

In this section, we report the results of our experiment using TF-IDF weighted word unigram features. Unigrams refer to individual tokens (words) extracted from the raw text, without applying stemming or lemmatization. We used the TfidfVectorizer from the scikit-learn library to convert the text into a numerical feature space, where each feature represents the Term Frequency–Inverse Document Frequency (TF-IDF) score of a unigram in the document. The TF-IDF values were computed based solely on the training data to avoid information leakage.

Table 8 presents the model performance across different classification tasks. For subtask 1, the Soft Voting ensemble achieved the highest overall performance, outperforming individual classifiers. A similar trend was observed in subtasks 2 and 3, where Soft Voting continued to yield superior results. This ensemble method combines predictions from multiple base classifiers using their predicted probabilities rather than hard labels, effectively balancing diverse decision boundaries and improving generalization.

Download:

Table 8. Classification results using Unigrams with TF-IDF values.

https://doi.org/10.1371/journal.pone.0337476.t008

Notably, Logistic Regression and linear SVM models also demonstrated strong individual performance. However, their combination through Soft Voting produced more balanced and stable results across all evaluation metrics. These findings indicate that TF-IDF word unigrams alone provide a strong and reliable signal for social support classification in English-language social media comments, highlighting the effectiveness of simple lexical features in capturing supportive communication patterns.

Social support detection with the combination of all features

In this section, we integrated a combination of all features including LIWC, emotions, sentiment scores, and Unigram with TF-IDF values. Table 9 presents the outcomes of our experiments across different classification tasks. In the first subtask, SVM (Linear) emerges as the top-performing model, surpassing others in terms of performance metrics. Moving to the second subtask, soft voting stands out with the highest performance value among the models considered. Transitioning to the third subtask, hard voting demonstrates superior results compared to other models. These findings highlight the varied strengths of different models when leveraging a comprehensive feature set, underscoring the importance of selecting appropriate models tailored to the specific task requirements.

Download:

Table 9. Classification results using all features.

https://doi.org/10.1371/journal.pone.0337476.t009

The integration of all feature types provided a richer and more discriminative representation of the input text, allowing models to better capture the multifaceted nature of social support. Notably, while linear SVM excelled in Subtasks 1 and 2 due to its ability to effectively separate high-dimensional data, the ensemble-based approaches (Soft and Hard Voting) outperformed in Subtask 3, which is inherently more challenging. This suggests that combining the outputs of diverse classifiers helps mitigate individual model biases and enhances robustness, especially in complex multi-class scenarios.

Overall, the use of hybrid feature sets and ensemble learning significantly boosts performance across tasks, reflecting the complementary contributions of linguistic, affective, and statistical text representations.

In Table 9, the comparison across three levels of SSD reveals the performance of various models with different feature combinations. In the first subtask, the combination of LIWC, Emotions and sentiments features, and Unigram with TF-IDF values with the SVM (linear) model demonstrate superior performance. Moving to the second subtask, we observe that the combination of LIWC, Emotions and sentiment analysis, and Unigram with TF-IDF values with the soft voting model yields the highest value. Finally, in the third subtask, the TF-IDF feature with the soft voting model exhibits the highest performance. These findings underscore the influence of both feature set and model choice on the effectiveness of SSD across different subtasks.

Deep learning

Table 10 provides an overview of model performance across various tasks, with a focus on the macro F1-score, which is considered a robust metric for evaluating model performance in tasks with imbalanced classes. Across different word embeddings (GloVe and FastText) and model architectures (CNN and BiLSTM), the configurations yielding the highest macro F1-scores are of particular interest. For example, in Task 1, using Glove embeddings with the BiLSTM model resulted in a macro F1-score of 0.7611, with precision and recall scores of 0.7751 and 0.7587, respectively. These values indicate a reasonably balanced trade-off between the two metrics, justifying the reliability of the F1-score in this case. Similarly, in Task 2 and Task 3, certain configurations achieved macro F1-scores of 0.8184 and 0.7235 respectively, reflecting strong generalization across all classes, particularly when both precision and recall are consistently high.

Download:

Table 10. Data classification results for deep learning models.

https://doi.org/10.1371/journal.pone.0337476.t010

Error analysis

In this section, we analyze the performance of the best-performing model for each subtask.

Table 11 presents the classwise scores for the best-performing models. For Subtask 1 and Subtask 2, the CNN with a Glove embedding. In Subtask 3, the best performance was achieved through soft-voting of models using only Unigram with TF-IDF values.

Download:

Table 11. Classwise scores for the best-performing models for each subtask, including sample sizes from Table 2.

https://doi.org/10.1371/journal.pone.0337476.t011

An additional column showing the sample sizes for each category (from Table 2) has been included in Table 11 to facilitate clearer comparison. The results indicate that although categories with larger sample sizes often achieve higher F1-scores, this trend is not consistent across all classes. For example, Nation (982 samples) obtained an F1-score of 0.8755, while LGBT (150 samples) achieved a higher F1-score of 0.9012. Similarly, the Black Community class, with a relatively small number of samples (114), reached an F1-score of 0.85. In contrast, categories such as Women and Religion showed both lower sample sizes and lower F1-scores. These observations indicate that while class imbalance has an impact on performance, certain minority classes can still be effectively recognized depending on their linguistic distinctiveness and internal consistency.

We also provided the confusion matrices for these models in Figs 2–4. The analysis of these confusion matrices reveals several key patterns of misclassification.

Download:

Fig 2. Confusion matrix for the best-performing model in subtask 1 (SVM (linear) + all features).

https://doi.org/10.1371/journal.pone.0337476.g002

Download:

Fig 3. Confusion matrix for the best-performing model in subtask 2 (CNN + Glove).

https://doi.org/10.1371/journal.pone.0337476.g003

Download:

Fig 4. Confusion matrix for the best-performing model in subtask 3 (soft voting + TF-IDF).

https://doi.org/10.1371/journal.pone.0337476.g004

For Subtask 1, supportive comments were frequently misclassified as non-supportive, whereas non-supportive comments were generally identified correctly. This asymmetry suggests that the model performs well at recognizing the absence of support but struggles to detect supportive expressions, likely due to their subtle linguistic or contextual cues.

In subtask 2, there was a notable confusion between support for individuals and support for groups. This suggests that the model finds it difficult to differentiate when the support is directed towards a single person versus a collective group, possibly because the expressions of support can be quite similar in both scenarios.

In subtask 3, when predicting targeted groups, the model often misclassified religious groups as nations. This specific confusion was more prominent than any other misclassification within this subtask. It indicates a particular challenge in distinguishing between religion-based and nation-based supportive comments, which may stem from overlapping cultural or contextual cues in the data.

Additionally, across all subtasks, there was significant misclassification into the”other” category. This category encompasses instances that may target multiple groups or do not fit neatly into predefined categories, making it a frequent source of confusion for the model.

Discussion

The dataset and experiments proposed in this paper have several characteristics and limitations that are briefly discussed in the following:

The dataset utilized in this paper was exclusively gathered from YouTube comments within a specific timeframe and targeted videos, which may introduce biases. To enhance the dataset’s diversity and richness, incorporating posts from other social media platforms, such as X and Reddit, in an open timeframe would be beneficial.
The dataset is relatively small and imbalanced in terms of supportive comments, which has affected the performance of the machine learning models. In future work, this issue could be addressed by increasing the dataset size, particularly for supportive comments. We also observed that the class imbalance influenced model accuracy, as simple baselines using the majority class already achieved relatively high accuracy scores (e.g., 0.77 for Subtask 1 and 0.81 for Subtask 2). This highlights the importance of evaluating models beyond accuracy and focusing on metrics such as F1-score, especially in imbalanced settings.
The current study reveals certain patterns in support expression within the analyzed dataset. However, since the videos and comments were not sampled to be representative of YouTube as a whole, we refrain from generalizing these observations to all YouTube content. Additionally, users show support for different nations without being heavily influenced by religious affiliations. The data indicates that recent support has not been predominantly directed towards any specific religion. Instead, people are more concerned with nations and communities, such as LGBTQ+ individuals and Black people. This trend highlights a broader social focus on national and community identities over religious considerations.
This study serves as a foundational step in introducing the task of social support detection aimed at fostering support and positivity as an alternative to merely filtering out hate speech. Consequently, the paper primarily focuses on the introduction of this concept, assessing the feasibility of the task, and examining it from a psychological perspective. As such, experiments involving state-of-the-art models like transformers and large language models have been deferred to future works.
It is important to note that the machine learning experiments for each subtask were conducted strictly on human-annotated subsets. For instance, only comments labeled as Social Support in Subtask 1 were used for further classification in Subtask 2, and likewise for Subtask 3. At no point were the predictions from previous subtasks used as inputs to subsequent ones. This design choice avoids the risk of cumulative error propagation across subtasks. While building a complete pipeline where predictions from one task feed into the next is a compelling direction for future research, such an experiment was not within the scope of the present study.

Conclusion and future work

This study explores the dynamics of social support within online communities by introducing a structured classification framework and providing a labeled dataset to support further research. By introducing the task of SSD and conducting experiments on a dataset of YouTube comments, we identified patterns in supportive interactions, such as the relevance of psycholinguistic features and the effectiveness of combining linguistic and machine learning techniques for classification.

Our findings reveal that YouTube users predominantly express support for groups of people, with less emphasis on individual support and religious affiliations. This highlights the importance of considering broader societal contexts when analyzing social support interactions online.

While our experiments have provided promising results, there are notable limitations to address. The dataset used is relatively small and imbalanced, limiting the generalizability of our findings. Additionally, biases inherent in the dataset, stemming from its exclusive focus on YouTube comments within specific parameters, need to be addressed through the diversification of data sources.

Looking ahead, future research in SSD should focus on expanding and diversifying datasets by incorporating data from multiple platforms, such as Reddit and Twitter, to improve the dataset’s breadth and generalizability. Additionally, future work should explore advanced modeling techniques, such as n-grams, Transformer-based models, and large language models (LLMs), and also incorporate hyperparameter optimization for traditional machine learning models to enhance the performance and accuracy of SSD tasks. Another important direction for future work is investigating how insights from SSD can inform the design of interventions aimed at promoting positive interactions within online communities. Furthermore, we plan to address the current class imbalance by balancing the dataset using methods such as paraphrasing comments with GPT models. By carefully addressing these challenges and grounding future research in robust empirical evidence, we can deepen our understanding of social support dynamics online and contribute to fostering more supportive and inclusive digital spaces.

Limitations

A key limitation of our study is the reliance on data collected from a single platform, YouTube, which may limit the broader applicability of our findings to other online environments. Additionally, the dataset exhibited class imbalance, with certain labels having a disproportionately low number of comments. This imbalance may have influenced the model’s performance, potentially skewing results towards the majority classes. Another limitation arises from our use of specific keywords to select 5,000 comments, which may have inadvertently excluded relevant comments containing different terms, thus limiting the diversity of the data.

Furthermore, the annotation process itself introduced challenges, particularly in distinguishing between individual and group support. For instance, a comment like “You are a true Italian!” may refer to personal (individual) support or to national identity (group) support—or both. This ambiguity posed difficulties for annotators. Our annotation guidelines instructed annotators to prioritize explicit cues: if the comment clearly referenced a social group (e.g., nationality, gender, religion), it was labeled as Group; otherwise, as Individual. In cases of uncertainty, we applied majority voting to determine the final label. These cases highlight the subjective nature of social support labeling and underscore the complexity of annotation in nuanced social contexts.

Finally, this study did not include spell-checking or correction during preprocessing.

Misspellings and orthographic variations common in YouTube comments may have affected the accuracy of lexical-based feature extraction. Future work could incorporate robust spell-correction methods to address this limitation.

Supporting information

S1 Appendix. YouTube videos and comment date ranges.

Description: This appendix provides the list of YouTube videos used and their corresponding comment date ranges.

https://doi.org/10.1371/journal.pone.0337476.s001

(DOCX)

S2 Appendix. Keywords used for supportive content sampling.

Description: This appendix contains the list of keywords and phrases used to identify potentially supportive comments.

https://doi.org/10.1371/journal.pone.0337476.s002

(DOCX)

References

1. Ko H-C, Wang L-L, Xu Y-T. Understanding the different types of social support offered by audience to A-list diary-like and informative bloggers. Cyberpsychol Behav Soc Netw. 2013;16(3):194–9. pmid:23363225
- View Article
- PubMed/NCBI
- Google Scholar
2. Jadad AR, Haynes RB, Hunt D, Browman GP. The Internet and evidence-based decision-making: a needed synergy for efficient knowledge management in health care. CMAJ. 2000;162(3):362–5. pmid:10693595
- View Article
- PubMed/NCBI
- Google Scholar
3. Merolli M, Gray K, Martin-Sanchez F. Health outcomes and related effects of using social media in chronic disease management: a literature review and analysis of affordances. J Biomed Inform. 2013;46(6):957–69. pmid:23702104
- View Article
- PubMed/NCBI
- Google Scholar
4. Arif M, Shahiki Tash M, Jamshidi A, Ullah F, Ameer I, Kalita J, Balouchzahi F. Analyzing hope speech from psycholinguistic and emotional perspectives. Scientific Reports. 2024;14(1):23548.
- View Article
- Google Scholar
5. Balouchzahi F, Shashirekha HL. LAs for HASOC—Learning Approaches for Hate Speech and Offensive Content Identification. In: Working Notes of FIRE 2020 – Forum for Information Retrieval Evaluation, Hyderabad, India, 2020. 145–51. https://ceur-ws.org/Vol-2826/
- View Article
- Google Scholar
6. Chakravarthi BR. HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion. In: Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, Barcelona, Spain (Online), 2020. 41–53. https://aclanthology.org/2020.peoples-1.5/
- View Article
- Google Scholar
7. Balouchzahi F, Sidorov G, Gelbukh A. PolyHope: Two-level hope speech detection from tweets. Expert Systems with Applications. 2023;225:120078.
- View Article
- Google Scholar
8. Mathew B, Saha P, Tharad H, Rajgaria S, Singhania P, Maity SK, et al. Thou Shalt Not Hate: Countering Online Hate Speech. ICWSM. 2019;13:369–80.
- View Article
- Google Scholar
9. Yang D, Kraut RE, Smith T, Mayfield E, Jurafsky D. Seekers, providers, welcomers, and storytellers: Modeling social roles in online health communities. In: Proceedings of the 2019 CHI conference on human factors in computing systems, 2019:1–14.
- View Article
- Google Scholar
10. Yang D, Kraut R, Levine JM. Commitment of Newcomers and Old-timers to Online Health Support Communities. Proc SIGCHI Conf Hum Factor Comput Syst. 2017;2017:6363–75. pmid:31423492
- View Article
- PubMed/NCBI
- Google Scholar
11. Madhu H, Satapara S, Pandya P, Shah N, Mandl T, Modha S. Overview of the HASOC Subtrack at FIRE 2023: Identification of Conversational Hate-speech. FIRE (Working Notes). 2023:351–9.
- View Article
- Google Scholar
12. Madhu H, Satapara S, Modha S, Mandl T, Majumder P. Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments. Expert Systems with Applications. 2023;215:119342.
- View Article
- Google Scholar
13. Albrecht TL, Goldsmith DJ. Social support, social networks, and health. The Routledge handbook of health communication. Routledge. 2003:277–98.
14. Barnes MK, Duck S. Everyday communicative contexts for social support. Communication of social support: Messages, interactions, relationships, and community. Thousand Oaks, CA: Sage Publications, Inc. 1994:175–94.
15. Cobb S. Social support as a moderator of life stress. Psychosomatic Medicine. 1976;38:300–14.
- View Article
- Google Scholar
16. Shumaker SA, Brownell A. Toward a Theory of Social Support: Closing Conceptual Gaps. Journal of Social Issues. 1984;40(4):11–36.
- View Article
- Google Scholar
17. Thoits PA. Conceptual, methodological, and theoretical problems in studying social support as a buffer against life stress. J Health Soc Behav. 1982;23(2):145–59. pmid:7108180
- View Article
- PubMed/NCBI
- Google Scholar
18. McCorkle BH, Rogers ES, Dunn EC, Lyass A, Wan YM. Increasing social support for individuals with serious mental illness: evaluating the compeer model of intentional friendship. Community Ment Health J. 2008;44(5):359–66. pmid:18481176
- View Article
- PubMed/NCBI
- Google Scholar
19. Cohen S, Wills TA. Stress, social support, and the buffering hypothesis. Psychol Bull. 1985;98(2):310–57. pmid:3901065
- View Article
- PubMed/NCBI
- Google Scholar
20. Arif M, Tash MS, Jamshidi A, Ameer I, Ullah F, Kalita J, et al. Exploring multidimensional aspects of hope speech computationally: A psycholinguistic and emotional perspective. 2024. https://www.researchsquare.com/article/rs-4378757/v1
21. ElSherief M, Kulkarni V, Nguyen D, Yang Wang W, Belding E. Hate Lingo: A Target-Based Linguistic Analysis of Hate Speech in Social Media. ICWSM. 2018;12(1).
- View Article
- Google Scholar
22. Caetano da Silva S, Castro Ferreira T, Silva Ramos RM, Paraboni I. Data Driven and Psycholinguistics Motivated Approaches to Hate Speech Detection. CyS. 2020;24(3).
- View Article
- Google Scholar
23. Miró-Llinares F, Rodríguez-Sala JJ. Cyber hate speech on twitter: Analyzing disruptive events from social media to build a violent communication and hate speech taxonomy. Int J Design Nature Ecodyn. 2016;11:406–15.
- View Article
- Google Scholar
24. Blaya C. Cyberhate: A review and content analysis of intervention strategies. Aggression and Violent Behavior. 2019;45:163–72.
- View Article
- Google Scholar
25. Anis MY. Hatespeech in Arabic language. In: Proceedings of the International Conference on Media Studies, 2017. 724.
- View Article
- Google Scholar
26. Chetty N, Alathur S. Hate speech review in the context of online social networks. Aggression and Violent Behavior. 2018;40:108–18.
- View Article
- Google Scholar
27. Saleh H, Alhothali A, Moria K. Detection of hate speech using BERT and hate speech word embedding with deep model. Appl Artif Intell. 2023;37:2166719.
- View Article
- Google Scholar
28. Palakodety S, KhudaBukhsh AR, Carbonell JG. Hope speech detection: A computational analysis of the voice of peace. In: Proceedings of the 24th International Conference on Applications of Natural Language to Information Systems (NLDB 2020), 2020:51–64.
- View Article
- Google Scholar
29. De Choudhury M, Kıcıman E. The Language of Social Support in Social Media and its Effect on Suicidal Ideation Risk. Proc Int AAAI Conf Weblogs Soc Media. 2017;2017:32–41. pmid:28840079
- View Article
- PubMed/NCBI
- Google Scholar
30. Shahiki Tash M, Ahani Z, Tash M, Kolesnikova O, Sidorov G. Analyzing Emotional Trends from X Platform Using SenticNet: A Comparative Analysis with Cryptocurrency Price. Cogn Comput. 2024;16(6):3168–85.
- View Article
- Google Scholar
31. Tash MS, Kolesnikova O, Ahani Z, Sidorov G. Psycholinguistic and emotion analysis of cryptocurrency discourse on X platform. Sci Rep. 2024;14(1):8585. pmid:38615123
- View Article
- PubMed/NCBI
- Google Scholar
32. Boyd RL, Ashokkumar A, Seraj S, Pennebaker JW. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin. 2022.
33. Bahar S, UÜ lker SV. A review of LIWC and machine learning approaches on mental health diagnosis. Social Review of Technology and Change. 2023;1:71–92.
- View Article
- Google Scholar
34. Thompson AD, Hartwig M. The language of high‐stakes truths and lies: Linguistic analysis of true and deceptive statements made during sexual homicide interrogations. Legal Criminol Psychol. 2022;28(1):34–44.
- View Article
- Google Scholar
35. Mohammad SM, Turney PD. Crowdsourcing A Word–emotion Association Lexicon. Computational Intelligence. 2012;29(3):436–65.
- View Article
- Google Scholar
36. Hutto C, Gilbert E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. ICWSM. 2014;8(1):216–25.
- View Article
- Google Scholar
37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–30.
- View Article
- Google Scholar
38. Tausczik YR, Pennebaker JW. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology. 2009;29(1):24–54.
- View Article
- Google Scholar
39. TensorFlow G. Large-scale machine learning on heterogeneous systems. Google Research. 2015;10.
- View Article
- Google Scholar

[ref1] 1. Ko H-C, Wang L-L, Xu Y-T. Understanding the different types of social support offered by audience to A-list diary-like and informative bloggers. Cyberpsychol Behav Soc Netw. 2013;16(3):194–9. pmid:23363225
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Jadad AR, Haynes RB, Hunt D, Browman GP. The Internet and evidence-based decision-making: a needed synergy for efficient knowledge management in health care. CMAJ. 2000;162(3):362–5. pmid:10693595
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Merolli M, Gray K, Martin-Sanchez F. Health outcomes and related effects of using social media in chronic disease management: a literature review and analysis of affordances. J Biomed Inform. 2013;46(6):957–69. pmid:23702104
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Arif M, Shahiki Tash M, Jamshidi A, Ullah F, Ameer I, Kalita J, Balouchzahi F. Analyzing hope speech from psycholinguistic and emotional perspectives. Scientific Reports. 2024;14(1):23548.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Balouchzahi F, Shashirekha HL. LAs for HASOC—Learning Approaches for Hate Speech and Offensive Content Identification. In: Working Notes of FIRE 2020 – Forum for Information Retrieval Evaluation, Hyderabad, India, 2020. 145–51. https://ceur-ws.org/Vol-2826/
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref6] 6. Chakravarthi BR. HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion. In: Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, Barcelona, Spain (Online), 2020. 41–53. https://aclanthology.org/2020.peoples-1.5/
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref7] 7. Balouchzahi F, Sidorov G, Gelbukh A. PolyHope: Two-level hope speech detection from tweets. Expert Systems with Applications. 2023;225:120078.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref8] 8. Mathew B, Saha P, Tharad H, Rajgaria S, Singhania P, Maity SK, et al. Thou Shalt Not Hate: Countering Online Hate Speech. ICWSM. 2019;13:369–80.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref9] 9. Yang D, Kraut RE, Smith T, Mayfield E, Jurafsky D. Seekers, providers, welcomers, and storytellers: Modeling social roles in online health communities. In: Proceedings of the 2019 CHI conference on human factors in computing systems, 2019:1–14.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref10] 10. Yang D, Kraut R, Levine JM. Commitment of Newcomers and Old-timers to Online Health Support Communities. Proc SIGCHI Conf Hum Factor Comput Syst. 2017;2017:6363–75. pmid:31423492
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Madhu H, Satapara S, Pandya P, Shah N, Mandl T, Modha S. Overview of the HASOC Subtrack at FIRE 2023: Identification of Conversational Hate-speech. FIRE (Working Notes). 2023:351–9.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref12] 12. Madhu H, Satapara S, Modha S, Mandl T, Majumder P. Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments. Expert Systems with Applications. 2023;215:119342.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Albrecht TL, Goldsmith DJ. Social support, social networks, and health. The Routledge handbook of health communication. Routledge. 2003:277–98.

[ref14] 14. Barnes MK, Duck S. Everyday communicative contexts for social support. Communication of social support: Messages, interactions, relationships, and community. Thousand Oaks, CA: Sage Publications, Inc. 1994:175–94.

[ref15] 15. Cobb S. Social support as a moderator of life stress. Psychosomatic Medicine. 1976;38:300–14.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Shumaker SA, Brownell A. Toward a Theory of Social Support: Closing Conceptual Gaps. Journal of Social Issues. 1984;40(4):11–36.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Thoits PA. Conceptual, methodological, and theoretical problems in studying social support as a buffer against life stress. J Health Soc Behav. 1982;23(2):145–59. pmid:7108180
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref18] 18. McCorkle BH, Rogers ES, Dunn EC, Lyass A, Wan YM. Increasing social support for individuals with serious mental illness: evaluating the compeer model of intentional friendship. Community Ment Health J. 2008;44(5):359–66. pmid:18481176
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref19] 19. Cohen S, Wills TA. Stress, social support, and the buffering hypothesis. Psychol Bull. 1985;98(2):310–57. pmid:3901065
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref20] 20. Arif M, Tash MS, Jamshidi A, Ameer I, Ullah F, Kalita J, et al. Exploring multidimensional aspects of hope speech computationally: A psycholinguistic and emotional perspective. 2024. https://www.researchsquare.com/article/rs-4378757/v1

[ref21] 21. ElSherief M, Kulkarni V, Nguyen D, Yang Wang W, Belding E. Hate Lingo: A Target-Based Linguistic Analysis of Hate Speech in Social Media. ICWSM. 2018;12(1).
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref22] 22. Caetano da Silva S, Castro Ferreira T, Silva Ramos RM, Paraboni I. Data Driven and Psycholinguistics Motivated Approaches to Hate Speech Detection. CyS. 2020;24(3).
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Miró-Llinares F, Rodríguez-Sala JJ. Cyber hate speech on twitter: Analyzing disruptive events from social media to build a violent communication and hate speech taxonomy. Int J Design Nature Ecodyn. 2016;11:406–15.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref24] 24. Blaya C. Cyberhate: A review and content analysis of intervention strategies. Aggression and Violent Behavior. 2019;45:163–72.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref25] 25. Anis MY. Hatespeech in Arabic language. In: Proceedings of the International Conference on Media Studies, 2017. 724.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref26] 26. Chetty N, Alathur S. Hate speech review in the context of online social networks. Aggression and Violent Behavior. 2018;40:108–18.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref27] 27. Saleh H, Alhothali A, Moria K. Detection of hate speech using BERT and hate speech word embedding with deep model. Appl Artif Intell. 2023;37:2166719.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref28] 28. Palakodety S, KhudaBukhsh AR, Carbonell JG. Hope speech detection: A computational analysis of the voice of peace. In: Proceedings of the 24th International Conference on Applications of Natural Language to Information Systems (NLDB 2020), 2020:51–64.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref29] 29. De Choudhury M, Kıcıman E. The Language of Social Support in Social Media and its Effect on Suicidal Ideation Risk. Proc Int AAAI Conf Weblogs Soc Media. 2017;2017:32–41. pmid:28840079
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref30] 30. Shahiki Tash M, Ahani Z, Tash M, Kolesnikova O, Sidorov G. Analyzing Emotional Trends from X Platform Using SenticNet: A Comparative Analysis with Cryptocurrency Price. Cogn Comput. 2024;16(6):3168–85.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref31] 31. Tash MS, Kolesnikova O, Ahani Z, Sidorov G. Psycholinguistic and emotion analysis of cryptocurrency discourse on X platform. Sci Rep. 2024;14(1):8585. pmid:38615123
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref32] 32. Boyd RL, Ashokkumar A, Seraj S, Pennebaker JW. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin. 2022.

[ref33] 33. Bahar S, UÜ lker SV. A review of LIWC and machine learning approaches on mental health diagnosis. Social Review of Technology and Change. 2023;1:71–92.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref34] 34. Thompson AD, Hartwig M. The language of high‐stakes truths and lies: Linguistic analysis of true and deceptive statements made during sexual homicide interrogations. Legal Criminol Psychol. 2022;28(1):34–44.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref35] 35. Mohammad SM, Turney PD. Crowdsourcing A Word–emotion Association Lexicon. Computational Intelligence. 2012;29(3):436–65.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref36] 36. Hutto C, Gilbert E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. ICWSM. 2014;8(1):216–25.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref37] 37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–30.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref38] 38. Tausczik YR, Pennebaker JW. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology. 2009;29(1):24–54.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref39] 39. TensorFlow G. Large-scale machine learning on heterogeneous systems. Google Research. 2015;10.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

Figures

Abstract

Introduction

Definitions

Related work

Dataset development

Data collection and processing

Annotator selection

Annotation guidelines

Annotation procedure

Inter-annotator agreement

Statistics of the dataset

Feature analysis

LIWC

LIWC feature analysis

Emotions

Sentiment analysis

Experiments

Traditional machine learning models

Preprocessing

Feature extraction

Model training and predictions

Deep learning

Results

Social support detection with LIWC, emotion, and sentiment features

Social support detection using TF-IDF word unigrams

Social support detection with the combination of all features

Deep learning

Error analysis

Discussion

Conclusion and future work

Limitations

Supporting information

S1 Appendix. YouTube videos and comment date ranges.

S2 Appendix. Keywords used for supportive content sampling.

References