Multimodal mental health analysis in social media

Depression is a major public health concern in the U.S. and globally. While successful early identification and treatment can lead to many positive health and behavioral outcomes, depression, remains undiagnosed, untreated or undertreated due to several reasons, including denial of the illness as well as cultural and social stigma. With the ubiquity of social media platforms, millions of people are now sharing their online persona by expressing their thoughts, moods, emotions, and even their daily struggles with mental health on social media. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of depressive symptoms from tweets obtained, unobtrusively. Particularly, we examine and exploit multimodal big (social) data to discern depressive behaviors using a wide variety of features including individual-level demographics. By developing a multimodal framework and employing statistical techniques to fuse heterogeneous sets of features obtained through the processing of visual, textual, and user interaction data, we significantly enhance the current state-of-the-art approaches for identifying depressed individuals on Twitter (improving the average F1-Score by 5 percent) as well as facilitate demographic inferences from social media. Besides providing insights into the relationship between demographics and mental health, our research assists in the design of a new breed of demographic-aware health interventions.


Introduction
Depression is a highly prevalent public health concern and a major cause of disability worldwide. Depression affects 6.7% (i.e., about 16 million) Americans each year [1]. According to the World Mental Health Survey conducted in 17 countries, about 5% of people reported having at least one depressive episode in 2011 [2]. Untreated or undertreated depressive PLOS ONE | https://doi.org/10.1371/journal.pone.0226248 April 10, 2020 1 / 27 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 twice as often as men, [22] and a national psychiatric morbidity survey in the UK has shown a higher risk of depression in women [23]. On the other hand, suicide rates for men are three to five times higher compared to women [24]. Women are more likely to socialize and express their dysphoria, while men tend to express their anger and show negative behaviors such as alcohol abuse and drug dependency [25]. Although depression can affect anyone at any age, the signs and risk factors for depression vary for different age groups [26]. Depression triggers for children include domestic violence, and loss of a pet, or family member. For adolescents, depression may arise from hormonal imbalances [27].
Late-life depression has caused the suicide rate in people aged 80 to 84 to be more than twice that of the general population [28]. Depression in the elderly population often occurs with other medical conditions that persist, which can increase the risk of death. Therefore, inferring demographic information while studying depressive behavior from passively sensed social data can shed better light on the population-level epidemiology of depression.
The recent advancements in deep neural networks, specifically for image analysis tasks, can lead to detecting demographic features such as age and gender [29]. We aim to show that by determining and integrating a heterogeneous set of features from different modalitiesaesthetic features from posted images (colorfulness, hue variance, sharpness, brightness, blurriness, naturalness), choice of profile picture (for gender, age, and facial expression), screen name, language features from both textual content and profile's description (n-gram, emotion, sentiment), sociability from ego-network, and user engagement-we can identify individuals who are more likely to be depressed from a data set of 8,770 human-annotated Twitter users.
We address the following research questions: 1) How well does the content of posted images (colors, aesthetic, and facial presentation) reflect depressive symptoms? 2) Does the choice of profile picture show any psychological traits corresponding to a depressed online persona? 3) Are profiles pictures reliable enough to represent demographic information such as age and gender, and can they be used for community-level management of depression? 4) Are there any underlying themes among depressed individuals generated using multimodal content that can be used to reliably detect depression?
Our contributions include: • Analysis of the content of posted images in terms of colors, aesthetic, facial presentation, and their associations with depressive symptoms; • Uncovering the underlying relationships between visual and contextual content of likely depressed profiles obtained using a demographic inference process which can facilitate community-level management of depression; and • Testing the performance of our interpretable heterogeneous feature set for predicting depressive symptoms.

Related work
We have divided the related work into four subsections. First, we discuss the state-of-the-art approaches for studying depressive behavior on social data. Second, we review studies that have inferred demographic information using social media data.Then, we discuss the association between color sensitivity and mental health disorders. Finally, we cover state-of-the-art studies that have used visual imagery to study individual's behavior.

Mental health analysis using social media
Several efforts have attempted to automatically detect depression from social media content utilizing machine learning, deep learning, and natural language processing approaches. From conducting a retrospective study of tweets, De Choudhury et al., (2013) characterizes depression based on factors such as language, emotion, style, ego-network, and user engagement. They built a classifier to predict the likelihood of depression from a written post [30] or an individual's profile [31]. Moreover, there have been significant advances due to the shared task [32] [13]. The role of visual imagery as a mechanism of self-disclosure by relating visual attributes to mental health disclosures on Instagram was highlighted by [14] where individual Instagram profiles were utilized to build a prediction framework for identifying markers of depression. The importance of data modality to understand user behavior on social media has been highlighted by [40]. More recently, a deep neural network sequence modeling approach that marries audio and text data modalities to analyze question-answer style interviews between an individual and an agent has been developed to study mental health [40]. Similarly, a multimodal depressive dictionary learning process was proposed to detect depressed users on Twitter [41]. They provide sparse user representations by defining a feature set consisting of social network features, user profile features, visual features, emotional features, topic-level features, and domain-specific features. Particularly, our choice to develop a multi-modal prediction framework is intended to improve upon previous work involving the use of images in multimodal depression analysis [41] and prior work on studying Instagram photos [15].

Demographic information inference on social media
Social media has been introduced as a critical channel to answer diverse research questions offering a wealth of data for public health research [42][43][44].
It can also assist in better understanding the relationship between behavioral changes and population health [45]. However, the lack of demographic indicators (e.g. age, gender, race) within the data is a major limitation for gaining deeper insights. Several research efforts have attempted to automate detection of social media users' demographic information as summarized below. For gender inference, several studies have analyzed users' tweets to detect gender differences reflected in linguistic patterns [ name, full name, profile description, and content on external resources (e.g., personal blog). Another supervised model was built to predict the user's age group by employing features including emoticons, acronyms, slang words and phrases, punctuation, capitalization, sentence length, and included links/images, along with online behaviors such as number of friends, post time, and commenting activity [52]. To attempt to infer the age of Dutch Twitter users, a model was built that utilizes the life stage of users such as secondary school student, college student, or employee [53]. Similarly, a novel model was introduced for extracting age for Twitter users by relying on profile descriptions while devising a set of rules and patterns [54]. They also parse descriptions for occupation by consulting the SOC2010 list of occupations [55] and validating it through social surveys. A novel age inference model was developed while relying on homophily interaction information and content to predict the age of Twitter users [56]. The intuition is that people within the same age group share similar content and become friends with contemporaries. Using an extensive set of experiments, they show that their model outperformed other state-of-the-art age inference models by leveraging online interaction and content information simultaneously. The limitations of textual content for predicting age and gender was highlighted by [57]. They distinguish language use based on social gender, age identity, biological sex, and chronological age by collecting crowdsourced signals from a game in which players (crowd) guess the biological sex and age of a user based only on their tweets. Their findings indicate how linguistic markers can be misleading (e.g., a heart represented as <3 can be misinterpreted as feminine when the writer is male). Estimating age and gender from facial images by training convolutional neural networks (CNN) for face recognition is another active line of research [58].

Colors sensitivity and depressive behavior
The strong associations between color sensitivity and mood has been highlighted by several studies [59]. In an earlier research, a strong correlation between specific color selection such as yellow and depressive behavior has been reported by [60]. With respect to color discrimination, findings based on a sample of 20 male patients, aged 18 between 45 years old with schizophrenia and manic-depressive psychosis, indicated that when their right hemisphere was depressed, the identification of color by saturation, shade, and color tone was impaired [61]. More recently, the association of color vision with bipolar disorder explored [62]. The general findings suggest that people suffering from depression are likely to reveal their mood through their choice of colors (such as preference for darker shades) in everyday life situations [63]. In this study, we leveraged the visual content shared on Twitter for studying such signals.

Social media and image analysis
The recent emergence of photo-sharing platforms such as Instagram, provides a unique opportunity to study people's behavior through the emotions [37] with broader application in personality prediction [64] and demographic inferences. Utilizing these platforms for populationlevels analysis helps to improve public health concerns [39] such as obesity [65], substance use [66], depression, and anxiety [67].
With regards to personality prediction, early efforts have shown that bag-of-visual-words and Facebook profile images could predict users' personality [68]. Various sets of features have been obtained from the images of 11,736 Facebook users were extracted to build a computational model which has more predictive power than human raters for predicting similar personality traits [69].

Dataset
This study is focused on obtaining community-level insights about depression signs and depressive behavior. As such, even though we analyzed individual's behavioral health information-which is considered sensitive-we utilized anonymized users in our datasets as per the approved Institutional Review Board (IRB) protocol. The study was approved and the informed consent process by Wright State University Institution review Board (SC#6258) 4.1.3.
Self-disclosure refers to revealing personal and intimate information about oneself to others, which can be therapeutic for psychological well-being [70]. Previous efforts highlight diverse modes of mental health self-disclosures on social media [12]. Self-disclosure clues have been extensively utilized for creating ground-truth data for numerous social media analytic studies such as predicting users' demographics [54], and depressive behavior [8]. For instance, vulnerable individuals may employ depressive-indicative terms in their Twitter profile descriptions. Other individuals may share their age and gender, e.g., "16 year old suicidal girl". We employed a large dataset of 45,000 Twitter users with self-reported depressive symptoms introduced initially in [8]. All information was obtained using advanced search API [71].
To seed the search, we created a lexicon of depressive symptoms consisting of 1,500 depressive-indicative terms with the help of clinical psychologists, and employed it to collect the Twitter profiles of individuals with self-declared depressive symptoms [72]. More specifically, the dataset provides the users' profile information including screen name, profile description, follower/followee counts, profile image, and tweet content, which can express various depression-relevant characteristics, and determine whether a user indicates any depressive behavior. Three human judges from the Department of Psychology at Wright State University assisted us in creating this annotated dataset. We reported the inter-rater agreement as K = 0.74 based on Cohen's Kappa statistics [8]. To create a robust gold standard dataset, we discarded the instances in which at least two (out of three) of our annotators did not agree about the depressive symptoms. Our final dataset contains 8770 users with 3981 depressed users, and 4789 control users that do not express any depressive symptoms in their Twitter data. This dataset U t contains the metadata values of each user such as profile descriptions, followers_count, create-d_at, and profile_image_url. Table 1 illustrates a sample of depressive-indicative phrases that appear in tweets from likely vulnerable users.

Clinical Depression Symptoms Depressive-indicative phrases in tweets
Feeling Down "People hate me," "I am Ugly," "I am depressed" Sleep disorder "we will never sleep," "we're fuxx dead" "I'm that tired," "why can't I sleep" Lassitude "0 energy to do anything" "cba with work," "I just want to snuggle up all day in bed" Obsessed with weight "Must not.eat," "must.be.thin" "94lbs, urgh I disgust myself" "Obssessed with my weight," "I just want be skinny" Feeling bad about yourself "I feel like a failure" "Im a piece of shix," Suicidal Thought "I just don't want to wake up tomorrow morning" "all my blades are so fuxx blunt" "Thinking hanging myself," "I've never been so sure about suicide" "how much blood can bleed from a cut into a vain" https://doi.org/10.1371/journal.pone.0226248.t001 To further measure the robustness of our dataset, we conducted another experiment by obtaining additional annotation from our colleagues from the Department of Psychiatry at Weill Cornell Medical College. Using the following formula, we computed a statistically reliable sample size: where N is population size, Z is z-score, e denotes margin of error, and p represents standard deviation. Specifically, we employed our dataset of 8770 (population size), and confidence interval of 95% (margin of error 5%) to obtain 400 users as a concrete sample size. We then randomly selected 400 users from the dataset of 8770 users to be evaluated by two additional human judges (from the Department of Psychiatry at Weill Cornell Medical College) by manually annotating whether users' content reflected depressive behavior or not. The average inter-rater agreement was (85% agreement, 0.77) based on Cohen's Kappa statistics, which denotes substantial agreement and implies the robustness of our dataset.

Age enabled ground-truth dataset
We extracted a user's age by applying regular expression patterns to profile descriptions (such as "17 years old, self-harm, anxiety, depression") [54]. We compiled "age prefixes" and "age suffixes", and used three age-extraction rules: 1. I am X years old, 2. Born in X, and 3. X years old, where X is a "date" or age (e.g., 1994). We selected a subset of 1061 users among U t as gold standard dataset U a who disclosed their age. From these 1061 users, 822 belonged to the depressed class, and 239 belonged to the control class. From the 3981 depressed users, 20.6% disclosed their age in contrast with only 4% (239/4789) among the control group, suggesting that self-disclosure of age is more prevalent among vulnerable users. Fig 1 depicts the age distribution in U a . The general trend, consistent with the results in [56,73], is biased toward younger individuals. Indeed, according to the Pew Research Center, 47% of Twitter users are in general 30 years old or younger [74]. Similar data collection procedures with comparable distribution have been used previously [56]. We discuss our approach to mitigate the impact of the bias in Section 3. The median age is 17 for the depressed class versus 19 for the control class. This suggests that the depressed-user population is younger, or depressed adolescents are more likely to disclose their age in order to connect with peers (social homophily) [75].

Gender enabled ground-truth dataset
We selected a subset of 1464 users U g from U t who disclosed their gender in their profile description. Out of 1464 users, 64% belonged to the depressed group, and the rest (36%) belonged to the control group. 23% of the likely depressed users disclosed their gender, which is considerably higher (12%) than that of the control class. Once again, gender disclosure varies among the two gender groups. For statistical significance, we performed a chi-square test (null hypothesis: gender and depression are two independent variables). Our findings indicate a strong association (Chi-square: 32.75, p-value:1.04e-08) between female gender, and expression of depressive symptoms on Twitter. These observations are consistent with the current literature which have shown that more women than men are diagnosed with depression [76]. In particular, the female-to-male ratio is 2:1 and 1:9 for major depressive disorder and dysthymic disorder, respectively.

Data modality analysis
We now provide an in-depth analysis of visual and textual content of vulnerable users.

Visual content analysis
We show that the visual content in posted images and profile images provide valuable psychological cues for understanding a user's depression status. Profile images and posted images can surface self-stigmatization [77]. As opposed to a typical computer vision framework for object recognition that relies on thousands of predetermined low-level features, emotions reflected in facial expressions are important when assessing user's online behavior, attributes contributing to the computational aesthetics, and sentimental quotes they may subscribe to.
The following sections present an in-depth analysis of visual content for both the depressed class and the control class with respect to three aspects: facial presence, facial expressions, and general image features.

Facial presence.
For capturing facial presence, we employed the model has been introduced in [78] where a multilevel convolutional coarse-to-fine network cascade developed to tackle facial landmark localization problem. We identified facial presentation, emotion from facial expression, and demographic features from profile images and posted images [79]. Table 2 illustrates facial presentation differences in both profile and posted images (media) for depressed users and control users in U t . For the control class, facial presence was significantly higher in both profile images and shared media (8%, 9% respectively) compared to the depressed class. In contrast with age and gender disclosure, vulnerable users were less likely to disclose their facial identity, possibly due to lack of confidence or fear of stigma.

Facial expression.
Following [20]'s approach, we adopted Ekman's model [80] of six emotions: anger, disgust, fear, joy, sadness, and surprise, and used the Face++ API [79] to automatically capture these emotions from the shared images. The positive emotions were joy and surprise, and negative emotions were anger, disgust, fear, and sadness. Foreach user u in U t , we processed profile images and shared images for both the depressed and control groups with at least one face from the shared images (Table 3). For the images that contained multiple faces, we perform mean pooling over the frames to obtain the expected emotional features.

General image features.
The importance of interpretable computational aesthetic features for studying users' online behavior has been highlighted by several efforts [81]. Color, as a pillar of the human vision system, has a strong association with conceptual ideas like emotion [82]. We measured the normalized red, green, blue, the mean of the original colors, brightness, and contrast relative to variations of luminance. We represented images in Hue-Saturation-Value color space that seems intuitive for humans, and measured the mean and variance for saturation and hue. Saturation is defined as the difference in intensity between different light wavelengths that compose the color. Although hue is not interpretable, high saturation indicates vividness and chromatic purity, which are more appealing to the human eye [20]. Colorfulness is measured as a difference against gray background [83]. Naturalness is a measure of correspondence between images and human perception of reality [83]. In color reproduction, naturalness is measured from the mental recollection of the colors of familiar objects. Additionally, there is a tendency among vulnerable users to share sentimental quotes bearing negative emotions. We performed optical character recognition (OCR) with pythontesseract [84] to extract text and their sentiment [85] score. As illustrated in Table 4, vulnerable users tend to use less colorful (higher grayscale) profile images and shared images to convey their negative feelings, and also share images that are less natural. In general, control users identified darker, grayer colors with negative mood, and generally preferred brighter, more vivid colors. By contrast, vulnerable users were found to prefer darker, grayer, and bluer colors. We found a strong positive correlation between self-declared depression and a tendency to perceive one's surroundings as gray or lacking in color. With respect to the aesthetic quality of images (saturation, brightness, and hue), there is a significant difference between the two classes, with depressed users more frequently sharing images that are less appealing to the human eye.
We employed an independent samples t-test, while adopting Bonferroni Correction as a conservative approach to adjust the confidence intervals. Overall, we had 223 features, and chose Bonferroni-corrected alpha level of 0.05/223 = 2.24e − 4 ( ��� p < alpha, �� p < 0.05).  depressed-(a) and control users-(b). Pairs without statistically significant correlation are crossed (p-value <0.05). https://doi.org/10.1371/journal.pone.0226248.g003 In general, the control users identified darker, grayer colors with negative moods, and generally preferred brighter, more vivid colors. In contrast, vulnerable users preferred darker, grayer colors, and bluer images. Vulnerable users shared images that are less aesthetically pleasing with lower sharpness, and those that do not contain faces or contain only one face. On the other hand, control users tended to use sharper images with multiple faces. Additionally, vulnerable users shared images with more text content, often containing depressive quotes and negative sentiments.
The desire to socialize and connect with others is also manifested in the visual imagery of vulnerable users. The images shared by vulnerable users tend to contain a single face (belonging to the user), rather than surrounded by friends and family. This further indicates the focus on the self, which is one of the most consistent markers of a mental disorder. This is also associated with an extensive usage of first person singular pronouns-which is another reliable marker of depression in content analysis of depressive behavior.

Demographics inference & language cues
LIWC [86] has been used extensively for examining the latent dimensions of self-expression for analyzing personality [87], depressive behavior, demographic differences [53, 57], etc. Several studies have shown that females employ more first-person singular pronouns [88], and deictic language (context-dependent words) [89], while males tend to use more articles [90] which characterize concrete thinking, and formal, informational, affirmative words [91]. For age analysis, the salient findings show that older individuals use more future tense verbs, [88] suggesting a shift in focus while aging. They also show more positive emotions [92], employ fewer self-references (i.e. 'I', 'me'), and more first person plural pronouns [88]. Depressed users employ first person pronouns more frequently [93], and repeatedly use negative emotions and anger words. We analyzed psycholinguistic cues and language style to study the association between depressive behavior and demographics. Specifically, we adopted Levinson's adult development grouping [94] that partitions users in U a into 5 age groups: (14,19], (19,23], summarizes textual content in terms of language variables such as analytical thinking, clout, authenticity, and emotional tone. It also measures other linguistic dimensions such as descriptor categories (e.g., percent of target words gleaned from the dictionary, or words longer than six letters-Sixltr), informal language markers (e.g., swear words, netspeak), and other linguistic aspects (e.g., first person singular pronouns).
Thinking Style: The words we use to communicate can reveal our style of thinking. There are two common approaches for extracting an individual's thinking style. First, measuring one's natural way of trying to understand, analyze, and organize complex events has a strong association with analytical, formal, and logical thinking. LIWC relates higher analytic thinking to more formal and logical reasoning, whereas a lower value indicates a focus on narratives. Second, cognitive processing, which measures problem solving in the mind, is captured through words such as "think," "believe," "realize," and "know" and demonstrates "certainty" in communication. High values for analytical thinking implies clarity of thought.
Critical thinking ability is related to education [95], and is impacted by different stages of cognitive development at different ages [96]. It has been shown that older people communicate with greater cognitive complexity while comprehending nuances and subtle differences [95]. All of these findings corroborate with our results (Table 5).
We observed notable differences in raw intelligence and the ability to think analytically in depressed and control users among different age groups (see Fig 4A and 4F and Table 5). Overall, vulnerable younger users do not think as logically based on their relative analytical score and cognitive processing ability. We can also observe that the differences between age groups above 35 tend to become smaller [97].
Authenticity: Authenticity measures the degree of honesty. Authenticity is often assessed by measuring present tense verbs, first person singular pronouns (e.g., I, me, my), and by examining the linguistic manifestations of false stories [98]. People who lie use fewer self-references, and fewer complex words. Psychologists often see a child's first successful lie as a mental milestone growth [99]. There is a decreasing trend in authenticity with age (see Fig 4B). Authenticity for depressed adolescents is strikingly higher than their control peers, and decreases with age ( Fig 4B).
Clout: People with high clout speak more confidently and with certainty, employing more social words with fewer negations (e.g., no, not) and swear words. In general, mid-life is relatively stable w.r.t. relationships and work. A recent study has shown that age 60 is best for selfesteem [100] as people take on managerial roles at work, and maintain satisfyinging relationships with their spouses. We see the same pattern in our data (see Fig 4C and Table 5). Unsurprisingly, lack of confidence (the 6th PHQ-9 [101] symptom) is a distinguishable characteristic of vulnerable users, leading to their lower clout scores, especially among depressed users younger than 34 years old.
Self-references: First person singular words often indicate interpersonal involvement, and their high usage is associated with negative affective states such as nervousness and depression [92]. Consistent with prior studies, the frequency of first person singular words for depressed users is significantly higher compared to that of the control class. Similarly to [92], adolescents tend to use more first-person (e.g. I), and second person singular (e.g. you) pronouns (Fig 4G). The impact of the above phenomenon is reflected in significantly higher frequency of self-references for depressed adolescents. As with the control class, a downtrend suggests that as depressed individuals age, they make more distinctions and psychologically distance themselves from their topics.

Informal Language Markers; Swear, Netspeak:
Swear lexicon includes terms such as "fu �� ", "dam � ", and "shi � ". Several studies have highlighted that the use of profanity by young adults has significantly increased over the last decade [102]. We observed the same pattern in both the depressed and the control classes (Table 5), with a higher rate for depressed users [10]. Psychologists have also shown that swearing may indicate that an individual is not a fragmented member of a society [103]. Depressed adolescents who show a higher rate of interpersonal involvement and relationships, have a higher rate of cursing (Fig 4E). Also, Netspeak lexicon measures the frequency of terms such as 'lol' and 'thx'. Although the rate is higher for the depressed class, we did not find any pattern concerning adult development.
Sexual, Body: The sexual lexicon contains terms like "horny", "love", and "incest", and body terms like "ache", "heart", and "cough". Both start with a higher rate for depressed users and decreases gradually as they age, possibly due to changes in sexual desire with age [104] ( Fig 4H and 4I and Table 5).

Quantitative language analysis.
We employed a one-way ANOVA to compare the impact of various factors, and validate our findings above. Table 5 illustrates our findings, with a degree of freedom (df) of 1055. The null hypothesis is that the sample means for each age group are similar for each of the LIWC features.

Demographic prediction
We leveraged both the visual and textual content for predicting age and gender.

Prediction with textual content.
We employed [105]'s weighted lexicon of terms that uses the dataset of 75,394 Facebook users who shared their status, age, and gender. The predictive power of this lexica was evaluated on Twitter, and Facebook, showing promising results [105]. Utilizing these two weighted lexicon of terms, we are predicting the demographic information (age or gender) of user i (denoted by Demo i ) using the following equation: where Weight lex (term) is the lexicon weight of the term, and Freq(term, doc) i represents the frequency of the term in the user generated doc i , and WC(doc) i measures total word count in (doc) i . As our data are biased toward younger individuals, we report age prediction performance for each age group, separately (Table 6). Moreover, to measure the average accuracy of this model, we built a balanced dataset (keeping the total number of users above 23-416), and then randomly sampled the same number of users from the age ranges (11,19] and (19,23]. The average accuracy of this model was 0.63 for depressed users, and 0.64 for the control class. Table 8 illustrates the performance of gender prediction for each class. The average accuracy was 0.82 on U g ground-truth dataset.

Prediction with visual imagery.
Inspired by [78]'s approach for facial landmark localization, we used their pre-trained CNN consisting of convolutional layers, including unshared and fully-connected layers, to predict gender and age from both the profile and shared images. We evaluated the performance of the gender and age prediction task on U g and U a , respectively, as shown in Table 6.

Demographic prediction analysis.
We delved deeper into the benefits and drawbacks of each data modality for demographic information prediction. This is crucial as the differences between language cues between age groups above 35 tend to become smaller (see Fig  4A, 4B and 4C), making the prediction harder for older individuals [97]. In this case, the other data modality (e.g., visual content) played an integral role as a complementary source for age inference. For gender prediction, on average, the profile image-based predictor provided a more accurate prediction for both the depressed and the control class (0.92 and 0.90), compared to the content-based predictor (0.82). For age prediction (see Table 6), the textual content-based predictor (on average 0.60) outperformed both of the visual-based predictors (on average profile: 0.51, Media: 0.53). However, not every user provided facial identity on his or her account (see Table 2). We studied facial presentation for each age group to examine any association between age group, facial presentation, and depressive behavior (see Table 7). We can see youngsters in both the depressed and control classes are not likely to present their face in their profile image. Less than 3% of vulnerable users between 11-19 years revealed their facial identity. Although the content-based gender predictor was not as accurate as the imagebased predictor, it is adequate for population-level analysis (see Table 8).

Multi-modal prediction framework
We used the above findings for predicting depressive behaviors. Our model exploits an early fusion [40] technique in feature space and requires modeling each user u in U t as vector Next, we adopted an ensemble learning method which integrated the predictive power of multiple learners with two main advantages; a high degree of interpretability with respect to the contributions of each feature, and a high predictive power. For prediction, we have y 0 i ¼ P m t¼1 f t ðu i Þ where f t is a weak learner and y 0 i denotes the final prediction. In particular, we optimized the loss function: where φ incorporates L1 and L2 regularization. In each iteration, the new f t (u i ) is obtained by fitting the weak learner to the negative gradient of loss function. Particularly, by estimating the loss function with Taylor expansion: where its first expression is constant, the second and the third expressions are first (g i ) and  second order derivatives (h i ) of the loss.
To explore the weak learners, assume f t has k leaf nodes, I j be subset of users from U t belongs to the node j, and w j denotes the prediction for node j. Then, for each user i belonging to I j , f t (u i ) = w j and φðf t Þ ¼ 1=2l Next, for each leaf node j, deriving w.r.t w j : and by substituting weights: which represents the loss of fixed weak learners with k nodes. The trees are built sequentially, such that each subsequent tree aims to reduce the errors of its predecessor trees. Although, the weak learners have a higher degree of biases, the ensemble model produces a strong learner that effectively integrates the weak learners by reducing bias and variance (the ultimate goal of supervised models) [108,109]. Table 9 illustrates how our multimodal framework outperforms the baselines for identifying depressed users in terms of average specificity, sensitivity, F-Measure, and accuracy in a 10-fold cross-validation setting on U t dataset. Fig 6 shows how the likelihood of being classified into the depressed class varies with each feature added to the model for a sample user in the dataset. The prediction bar (the black bar) shows that the log-odds of prediction is 0.31, that is, the likelihood of this person being a depressed user is 57% (1 / (1 + exp(-0.3))). The figure also sheds light on the impact of each contributing feature. The waterfall charts represent how the probability of being depressed varies when adding each feature. For example, for our dataset, the "Analytic thinking" score measured by LIWC from the tweet content is a high value of 48.43 (Median:36.95, Mean: 40.18) and this decreases the chance of the user being classified into the depressed group by the log-odds of -1.41. This is due to the fact that depressed users have significantly lower "Analytic thinking" scores compared to the control class. Moreover, the "Clout" score of 40.46 is considered a low value (Median: 62.22, Mean: 57.17), and increases the chance of being classified as a depressed user. This is justifiable given the clear association between low self-esteem and risk for depression. With respect to the visual features, the mean and the median of "shared colorfulness" is 112.03 and 113, respectively. The value of 136.71 would be high, and decreases the chance of being depressed by logodds of -0.54. As mentioned earlier, depressed users preferred darker, and grayer colors. The score of 0.46 as "profile naturalness" is considered high compared to 0.36 (the mean for the depressed class) which justifies pull down of the log-odds by −0.25. Using network features, for instance, the "two hop neighborhood" for depressed users (Mean: 84) are less than that of the control users (Mean: 154), and is reflected in pulling down the log-odds by -0.27.

Baselines
To test the efficacy of our multi-modal framework for detecting depressed users, we compared it against existing content, content-network, and image-based models (based on the aforementioned general image features, facial presence, and facial expressions).

Content-based models.
Language biases in social media posts can be a good representative of emotional state. Fig 7 illustrates the word clouds that distinguish the word usage of likely-depressed and non-depressed profiles. It is clear that depressed users often care more about their appearance. This is indicative by their usage of terms such as "pretty" and "beautiful." They also have a tendency to talk about their family and relations using words such as family, hugs, parents, daddy, mums, sigh, grandma, maam, friendless, love, friend, mommy, people, boyf, and gf. In contrast, the control users usually talk about daily events and news such as "hurricane" and "Trump". Such differences in word usage highlight the fact that user generated words can be distinguishable features for detecting depressed user profiles. See Table 9 for the comparative performance of our prediction framework against state-of-the-art methods used for predicting depressive behaviors-many of which employed the same feature sets and hyperparameter settings (see Models I-V). Several prior efforts have demonstrated that word embedding models can reliably enhance short text classification [115], Model VI by employing pre-trained word embeddings which have trained over 400 million tweets [116] while representing a user with retrieving word vectors for all the words a user used in tweets and profile description. We aggregate these word vectors through their means and feed it as input to a SVM classifier with a linear kernel. In Model VII, we employed [8]'s dataset of 45,000 selfreported depressed users and trained a Skip-gram model with negative sampling to learn word representations. We chose this model because it generates robust word embeddings even when the collection of training words are sparse [117]. We set dimensionality to 300 and a negative sampling rate to 10 sample words, which has shown promising results with mediumsized datasets [117]. Besides, we have observed that many vulnerable users chose specific account names, such as "Suicidal_Thoughxxx," and "younganxietyyxxx," which are good indicators of their depressive behavior. We used Levenshtein distance between depression indicative terms in [8]'s depression lexicon and the screen name to capture their degree of semantic similarity [118].

Image-based models.
We employed the aforementioned visual content features including facial presence, aesthetic features, and facial expression for depression prediction. We use three different models: Logistic Regression (Model IX), SVM (Model X), and Random Forest (Model XI). The poor performance of image-based models suggests that relying on a unique modality would not be sufficient for building a robust model due to the complexity and abstruse nature of the prediction task.

Network-based models.
Network-based features indicate the user's desire to socialize and connect with others. There is a notable difference between the number of friends, followers, favorites, and status count for depressed and control users (see Table 4). For building a baseline Model VIII, we obtained egocentric network measures for each user based on the network formed using @-replies interactions among them. The egocentric social graph of a user u is an undirected graph of nodes in u's two-hop neighborhood in our U a dataset, where the edge between nodes u and v implies that there has been at least one @-reply exchange. Network-based features including Reciprocity, Prestige Ratio, Graph Density, Clustering Coefficient, Embeddedness, Ego components and Size of two-hop neighborhood were extracted from each user's network [10] to reliably capture user context for depression prediction.
High values for the three metrics-clustering coefficient, embeddedness, and number of ego networks-indicates that the depressed users tend to build a close network of trusted people to share their mental health issues. For both graph density and size of the two-hop neighborhood, a lower value indicates fewer interactions.

Conclusion and future work
We presented an in-depth analysis of visual and contextual content of likely depressed profiles on Twitter. We employed them for demographic (age and gender) inference processes. We developed a multimodal framework, employing statistical techniques for fusing heterogeneous sets of features obtained by processing visual, textual, and user interactions. Conducting an extensive set of experiments, we assessed the predictive power of our multimodal framework while comparing it against state-of-the-art approaches for depressed user identification on Twitter. The empirical evaluation shows that our multimodal framework is superior to them and it improved the average F1-Score by 5 percent. Effectively, visual cues gleaned from content and profile images shared on social media can further augment inferences from textual content for reliable determination of depression indicators and diagnoses. In the future, we plan to apply our approach to various data sources such as longitudinal electronic health record (EHR) systems, and private insurance reimbursement and claims data, to develop a robust "big data" platform for detecting clinical depressive behavior at the community level.
Supporting information S1 File. The informed consent of this study approved by Wright State University Institution review Board (SC#6258). (PDF)