Figures
Abstract
Assessing well-being with social media text data is a promising method, but besides hedonic well-being, little is known about whether additional well-being dimensions, such as psychological richness and eudaimonic well-being, can be predicted from such data. We compare the predictive accuracy for hedonic well-being, eudaimonic well-being, and the recently proposed construct of psychological richness in a large sample of Facebook users (n = 2,644), and find that the inclusion of language features incrementally improved model prediction accuracy beyond demographic features for psychological richness, but not for hedonic or eudaimonic well-being. Psychological richness had the lowest overall prediction accuracy (r = .21) followed by hedonic well-being (r = .27) and eudeomonic well-being (r = .29). The linguistic features associated with Psychological Richness were face valid, and in many instances the content and direction of the associations were unique to Psychological Richness, which provides discriminant validity evidence.
Citation: Bonner CV, Cho Y-M, Zhang F, Tay L, Ungar L, Chandra Guntuku S (2026) The assessment of psychological richness, meaning, and happiness with social media text data: Predictive accuracy and distinct behavioral correlates. PLoS One 21(1): e0337649. https://doi.org/10.1371/journal.pone.0337649
Editor: Angelina Wilson Fadiji, De Montfort University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: March 16, 2025; Accepted: November 11, 2025; Published: January 7, 2026
Copyright: © 2026 Bonner et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Survey data and the accompanying R code for the survey data have been archived on OSF (https://osf.io/rz9cw/). The data used in the text analyses cannot be archived publicly on OSF because the data contain potentially identifying information, which has not been approved by the Purdue University Institutional Review Board (irb@purdue.edu).
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The health, performance, and social consequences of hedonic well-being (also known as subjective well-being [SWB]; e.g., [1,2]), have inspired efforts to monitor well-being at the national and international levels [3]. However, traditional representative survey approaches are limited by low response rates, perceived intrusiveness, high costs, and their inability to comprehensively monitor well-being over time [4]. Recent attempts to automatically predict well-being and other individual differences based on language features extracted from social media platforms have demonstrated that language-based assessments (LBAs) are a promising alternative to traditional surveys. Language features extracted from social media platforms such as Facebook and Twitter accurately predict self-reported SWB [5–7]. Language features also predict geographical [8–10] and temporal [11] variation in components of hedonic well-being such as positive and negative affect, and satisfaction with life.
In addition to state-of-the-art accuracy, LBAs appear to be face-valid, display reasonable test-retest reliability, and provide novel, theoretically relevant information about actual behaviors (e.g., personality traits [12]; well-being [13]), which is an important way to advance psychological science [14]. The accumulated evidence suggests that language is a behavioral manifestation of individual differences, including hedonic well-being, which can therefore be used to assess the latent variable of interest (but see [15,16] for a discussion of validity issues). Studies employing the Linguistic Inquiry and Word Count program (LIWC; [17]) have observed small but consistent associations between subjective well-being and language features (e.g., [5,18]). Building on the initial success of the LIWC’s “closed” vocabulary approach, recent “open-vocabulary” studies have identified novel associations between naturally occurring clusters of topics and individual differences [10]. In addition to evidencing greater predictive validity compared to the LIWC, topic clusters created through Latent Dirichlet Allocation (LDA; [19]) can be interpreted as behavioral expressions of individual difference constructs in the context of social media and can therefore suggest new hypotheses about the behavioral indicators of the construct.
Schwartz et al. [13] found that open-vocabulary LDA topic clusters offered significant incremental validity to the LIWC for predicting hedonic well-being. Additionally, significantly associated topic clusters referred to theoretically relevant topics and behaviors, such as references to professional employment, positive relationships, communal engagement, and positive engagement with activities. These results suggest that LBA methods are promising for the prediction of well-being constructs, as well as explaining their manifestation in social media contexts. However, LBA methods have primarily focused on hedonic well-being, and we consequently know little about whether other dimensions of well-being, such as eudaimonic well-being and psychological richness, can be assessed with alternative methods.
Hedonic well-being, eudaimonic well-being, and psychological richness
Traditionally, theories of the good life have focused on eudaimonic well-being, which broadly refers to having purpose and meaning in life [20]. Eudaimonic well-being stems from an Aristotelian tradition regarding the optimal functioning of an individual, and is conceptually and empirically distinct from hedonic well-being. In contrast, hedonic well-being emphasizes a utilitarian approach to the degree of pleasure one experiences. More recent research suggests that the psychologically rich life, characterized by heterogeneous, novel, and interesting experiences [21], can be considered a distinct dimension of the good life [22]. A substantial minority of people across multiple cultures implicitly and explicitly idealize a psychologically rich life as their preferred form of the good life, compared to hedonic and eudaimonic well-being [23]. Psychological richness is psychometrically distinct from hedonic and eudaimonic well-being in both self-report and informant-report data [22], and the Psychologically Rich Life Questionnaire displays good discriminant and test-retest validity [24]. Additionally, informant reports of psychological richness were reliably associated with self-reports across two measurement periods, and the self-other agreement was comparable to the self-other agreement for constructs such as positive affect, negative affect, life satisfaction, and meaning in life [24], which indicates that trait-relevant cues about psychological richness are accessible to informants. This suggests that trait-relevant cues for psychological richness are likely to be present and retrievable from social media text data.
Evidence for a three-factor structure has been reported in comparisons of the Psychologically Rich Life Questionnaire with other well-validated measures of hedonic and eudaimonic well-being [24], as well as in the Good Life Scale (GLS), which is intended to simultaneously measure the three dimensions of well-being [22]. While psychological richness is positively associated with hedonic and eudaimonic well-being, novel experiences like studying abroad in another county are associated with group-level increases in psychological richness but not with increases in other well-being constructs ([25]; Studies 3 and 4). This evidence suggests that while many life experiences and outcomes have previously displayed null or negative relationships with hedonic or eudaimonic well-being, they might still be associated with unmeasured dimensions of well-being, such as psychological richness. Additionally, experience sampling data show that activities like taking short trips, engaging in artistic activities, or reporting a busy day were predictors of a psychologically rich day for college students, suggesting that the configuration of daily experiences associated with psychological richness is distinct from hedonic and eudaimonic well-being ([25]; Study 1).
The present study
Because psychological richness is a novel well-being construct, there is no prior research investigating how well psychological richness can be predicted from social media text data, and how this precision compares to eudaimonic and hedonic well-being. In the present study, we first report on the internal structure validity, discriminant validity, and convergent validity of the Good Life Scale (GLS; [22]), which is intended to measure psychological richness, hedonic well-being, and eudaimonic well-being. We then test whether psychological richness, hedonic well-being, and eudaimonic well-being can be predicted from a large sample of Facebook text data by comparing the predictive validity of different approaches, such as using demographic data, LIWC topics, and LDA topics. We test the incremental predictive validity of linguistic features above and beyond demographic features for each of the GLS constructs, and compare the relative prediction accuracies between the GLS constructs to test whether psychological richness and eudaimonic well-being can be predicted as well as hedonic well-being.
Finally, the construct validity of psychological richness is still being established and evaluated. Because prior social media text mining investigations have found the LIWC and LDA topics associated with subjective well-being variables to be generally face valid [13], we also aim to describe and interpret the LIWC and LDA topics that are associated self-reports of psychological richness, hedonic well-being, and eudaimonic well-being. Our descriptive approach aims to identify language features that evidence the discriminant and face validity of psychological richness as a meaningful dimension of the good life. By describing the language features associated with psychological richness, we aim to identify novel linguistic-behavioral information about psychological richness that can inform future research into the nature of the construct.
Method
Ethics statement
The study procedures were approved by the Institutional Review Board of Purdue University (#IRB-2020–301); we obtained written informed consent.
Participants and procedure
We paid for a targeted sample of 3,000 participants from the Qualtrics online survey panel. Data collection commenced on May 11th, 2020 and ended on June 15th, 2020. Participants had to be employed full-time, reside in the United States, be over the age of 18, and have a Facebook account with at least 500 words across all posts. To ensure that participants had sufficient words for language analysis, we obtained written informed consent to access their Facebook posts, collected their posts using the Facebook Graph API, and screened participants who had less than 500 words across all posts on their accounts (see [26] for the rationale behind this exclusion criteria). We obtained an initial sample of 3,215 participants who completed the survey. We retained an analytic sample of 2,644 participants after excluding cases for not having enough words, failing attention check questions, not completing the full GLS, and for not reporting data for gender, age, income, and education (69% female; median age = 43, range = 18–65). The 105 participants who selected “65 or older” were grouped in with the 27 participants who identified themselves as 65 in the initial survey. Additionally, two cases shared an identical Qualtrics participant ID, so we randomly removed one of the cases.
Measures
Demographics.
We measured gender as a binary variable (male = 1, female = 2). We operationalized education as an ordinal variable (1 = less than high school, 2 = high school diploma or equivalent; 3 = vocational or technical training, 4 = bachelor’s degree or equivalent; 5 = master’s degree or equivalent, 6 = doctoral degree). Income was operationalized as a seven-point ordinal variable with varied increments (1 = less than $10,000, 2 = $10,000–19,000, 3 = $20,000-$34,999, 4 = $35,000-$49,999, 5 = $50,000-$99,999, 6 = $100,000-$149,999, 7 = More than $150,000).
Good Life Scale.
We used the Good Life Scale (GLS; [22]) to measure psychological richness alongside the well-established constructs of eudaimonic and hedonic well-being. Participants were asked to indicate the extent to which they agreed with 15 adjectives describing their lives (“My life has been…”) on a seven-point scale (1 = strongly disagree; 7 = strongly agree). The subscales are referred to as psychological richness (interesting, dramatic, psychologically rich, uneventful, and monotonous; α = .711), meaning (meaningful, fulfilling, purposeful, meaningless, and disorganized; α = .85), and happiness (happy, enjoyable, comfortable, unstable, and sad; α = .861) for brevity throughout the rest of the paper.
Big five.
We measured the Big Five trait domains of Openness (M = 5.1, SD = 1.21, α = .410), Conscientiousness (M = 5.54, SD = 1.22, α = .557), Extraversion (M = 3.88, SD = 1.58, α = .658), Agreeableness (M = 5.24, SD = 1.17, α = .386) and Neuroticism (M = 3.22, SD = 1.44, α = .704) with the Ten-Item Personality Inventory [27]. Items were presented on a seven-point scale (1 = strongly disagree; 7 = strongly agree). Scale reliabilities are low because each scale comprises just two items. While these internal reliability estimates may seem low, the two items for each Ten-Item Personality Inventory domain were intentionally selected to represent the breadth and heterogeneity of the domain.
Analytic strategy
The design of the study and analyses were not pre-registered. These analyses are intended to be descriptive and exploratory, and are not designed to deductively test hypotheses.
We first conducted a three-factor and bi-factor confirmatory factor analysis of the GLS data with lavaan [28] package, and used the semTools [29] package to compute reliability statistics.
We incorporated both open-vocabulary and closed-vocabulary features — Latent Dirichlet Analysis (LDA; [19]) and Linguistic Inquiry and Word Count (LIWC; [30]) — to predict the Good Life Scale constructs. To represent users’ language, we employed LDA as the primary feature and generated 2000 topics using the DLATK package [31]. To ensure sufficient coverage and diversity, we used topics that were generated on larger datasets, which is a common practice in natural language processing. Specifically, we utilized 2000 English topics that were produced from a corpus of approximately 18 million Facebook updates [10]. To limit the number of topics per document, we set alpha to 0.30. Each topic is represented as a set of words with probabilities. We evaluated each post’s probability of containing each of the 2000 topics (p(topic, post)) by computing its probability of containing a word (p(word | post)) and the probability of the words being in given topics (p(topic | word)).
Furthermore, 73 English categories were procured from LIWC [30], which encompassed psychological, social, and syntactic categories. Relative frequency was computed for each dictionary, which was determined from the total number of times a word written by the user matched a word in each dictionary, divided by the user’s overall word count.
We assessed the Good Life Scale constructs with a variety of predictive models that relied on varying combinations of demographic variables, LDA topic data, and LIWC data as predictors. We used 10-fold cross-validation, a resampling technique that involves partitioning the dataset into complementary subsets, where a subset is used for testing while the remaining subsets are used for model training on multiple iterations. We tested incremental predictive validity between models with paired t-tests with the SciPy Python library [32]. To determine whether the accuracy of predictive models were significantly different between sub-scales, we conducted two-tailed z-tests for the difference between dependent correlations with an overlapping variable (in this case, the shared demographic and/or language feature predictors) with the cocor R package [33]. To examine the correlations between individual LDA topics and the GLS scales scores, we controlled for age and gender and applied the Benjamini–Hochberg correction for multiple testing [34].
Statistical power and effect size interpretation.
While a formal power analysis was not conducted a priori, a post-hoc power analysis with G*Power 3 ([35]; procedure: “exact test: correlation, bivariate normal model”) indicated that with our final analytic sample, we had 99% power to detect an effect size of r = .10 at α = .01 for a two-tailed test. A sensitivity power analysis indicated that we had 80% power to detect a correlation as low as r = .06 at α = .01 for a two-tailed test. We interpret an effect size of r = / > .20 as “medium”, r = / > .10 as “small”, and r = / > .05 as “very small” based on the effect size guidelines suggested by Funder and Ozer [36] for psychological research. We do not interpret effects smaller than this, as they are unlikely to be replicable or meaningful in this context. Therefore, we had high (99%) power to detect small effects, and adequate (80%) power to detect very small effects up to r = .06. Significant associations between individual LDA topics and responses to self-report measures of well-being are typically very small to small in prior research (e.g., [37]), and we are therefore adequately powered to observe associations between LDA topics and the GLS scores. One small effect in isolation may be the result of random noise (see [38]), however, a consistent and replicable pattern of small effects can be informative about the natural language used by people who perceive their lives to be happy, meaningful, or psychologically rich.
We interpret individual associations between LDA topics and GLS scores when the effect size is greater than or equal to the Funder and Ozer [36] “very small” criterion of r = .05 and has a (Benjamini–Hochberg corrected) p-value of <.01, as this threshold is tied to the replicability of an effect [39].
Following the suggestion of a reviewer, we also conducted a sensitivity analysis for the paired model comparison t-tests. We had 80% power to detect a difference as small as dz = .054 with our analytic sample of 2,644 at α = .05 for a two-tailed test. Therefore, we were adequately powered to detect very small differences between the models.
Results
Validity properties of the Good Life Scale
A confirmatory factor analysis of the Good Life Scale indicated that an orthogonal, three-factor solution was not a good fit for the data (χ2(90) = 6730.636, p < .001; CFI = .726; TLI = .680; RMSEA = .167; SRMR = .274) according to conventional standards [40]. The unidimensional reliability coefficients from the orthogonal CFA for psychological richness (
=.72), meaning (
=.85), and happiness (
=.84) were similar to the α reliability coefficients mentioned in the Methods section. All
reliability results are based on the omega2 estimator of
in the semTools package, which uses the model-implied variance instead of the observed sample variance (see [41] for discussion of the differences between estimators). A bi-factor model that included a “general positivity” factor to account for the shared variance between the items moderately improved model fit (χ2(75) = 2926.320, p < .001; CFI = .882; TLI = .835; RMSEA = .120; SRMR = .078), but not to acceptable cutoffs apart from the SRMR. Factor loadings for the items in these models are included in Tables SA1 and SA2 of S1 File (https://osf.io/35nd2?view_only=020baa24c13f49ccb097405ea5b92cdf). Hierarchical
reliability for the general factor was.80. There essentially no variance in the meaning subscale that was due to the specific factor above the general factor (
=.0004). Happiness had slightly more variance due to the specific factor (
=.27), and psychological richness had the most (
=.51).
We report the CFA results for transparency and context, but chose to refrain from modifying the scale further because the initial item pool was small, and we did not have a hold-out sample to test a modified model upon. Though the bi-factor model did indicate incrementally better fit, it also undermines the interpretability of the GLS for the LDA and prediction analyses that are the focus of the present study, so we use the unmodified GLS in the present research and encourage future scale development research to improve the psychometric properties of the GLS.
Psychological richness, meaning, and happiness all had significant, positive bivariate correlations with age, income, and education (Table 1). The positive association between psychological richness and income and education replicated prior meta-analytic estimates (Oishi & Westgate, 2021). Psychological richness and happiness were not significantly associated with gender, while meaning had a very small association (r = .06, p < .01).
Finally, we assessed the discriminant and convergent validity of the Good Life Scale by examining the associations between each sub-scale and a brief measure of the Big Five. Psychological richness had the strongest association with Openness (r = .37, p < .01) relative to the meaning (r = .29, p < .01) and happiness (r = .17, p < .01) scales, which replicates prior research (Oishi & Westgate, 2021) and speaks to the convergent and discriminant validity of the psychological richness subscale. Psychological richness was also positively associated with Conscientiousness, Extraversion, and Agreeableness, and negatively associated with Neuroticism (see Table 2), which replicates prior meta-analytic findings [22]. However, the low reliability of the brief, two-item personality scales limits the precision of these results.
Prediction of the Good Life Scale scores with demographic and language features
Predictive models for the GLS scales using seven combinations of language and demographic data are presented in Table 3. Prediction accuracy from demographic information alone was significantly higher for meaning (z = −6.0768, p < .001) and happiness (z = −3.9961, p < .001) compared to psychological richness, while prediction accuracy did not differ between meaning and happiness (z = −1.6846, p = .0921).
Prediction accuracy from linguistic features alone (Model 6) was not significantly different between psychological richness and happiness (z = −1.2164, p = 0.2238), but meaning was more accurate than psychological richness (z = −3.7424, p = 0.0002) and happiness (z = −3.1088, p = .0019). In the model that combined language features and demographics, the psychological richness model was still just as accurate as the happiness model (z = −1.5939, p = .111), but meaning was more accurate than psychological richness (z = −4.0897, p < .001) and happiness (z = −2.9569, p = .0031).
The addition of language features (i.e., LIWC and LDA) in Model 7 incrementally improved predictive accuracy over just using baseline demographic predictors (age, gender, income, and education in Model 3) for the prediction of psychological richness (t(2643) = −2.148, p = 0.016), but not for meaning (t(2643) = −1.530, p = 0.063) and happiness (t(2643) = −1.337, p = 0.091). Compared head-to-head, language features alone (Model 6) did not produce a significantly more accurate model compared to demographic features (Model 3) for psychological richness (t(2643) = −1.606, p = 0.054), happiness (t(2643) = −0.343, p = 0.366) and meaning (t(2643) = −0.637, p = 0.262). It should be noted that the most accurate model for happiness and meaning only used LIWC features, while the model that combined all language and demographic features was the most accurate for psychological richness.
Language topic correlates of the Good Life Scale
Two of the authors (CVB and FZ) sorted the individual LDA topics that correlated at p < .01 with each of the factors into higher-order themes. All LDA associations that exceeded the p-value threshold also exceeded r > |.05| and were at least very small. Negative associations ranged from r = −.057 to −.210; positive associations ranged from r = .057 to.149. We only present themes with at least three example topics, and limit examples of each theme to the 3 largest effect sizes in the tables for brevity (see SB4 in S2 File for all topic correlations that correspond to each theme; https://osf.io/qanp3?view_only=020baa24c13f49ccb097405ea5b92cdf). We also present the closed-vocabulary LIWC category correlations, and note when the analogous LIWC categories conceptually replicated the results of the LDA analyses.
Topics positively correlated with the happiness factor (Table 5) were sorted into the labels of Leisure and Entertainment (e.g., concert, shopping, and vacation), Family (e.g., family, son, and kiddos), Celebration and Gathering (e.g., birthday, thanksgiving, and anniversary), Gratitude (e.g., thanks, blessed, and grateful), God and Religion (e.g., church, worship, and god), and Education (e.g., college, degree, and graduation). Positive Affect (e.g., amazing, awesome, fun) was also evident in many of the topics. Topics also referred to Travel (e.g., flight, hotel, trip) and Geography (e.g., London, California, Tokyo). The LIWC categories of Affiliation, Leisure, Positive Emotions, Family, and Religion were positively associated with happiness and conceptually replicate these results (Table 4). The LIWC categories of 1st Person Plural Pronouns, Drives and Needs, Time, Home, Adjectives, and Work were also positively associated with happiness.
Topics negatively associated with happiness were described with the labels of Negative Affect (e.g., anxiety, depressed, and scared), Hostility (e.g., stupid, crap, and hell), Discussion of Emotions/Mixed Emotions (e.g., feels, felt, laughing, crying), Body (e.g., head, eyes, and foot), Health (e.g., sick, flu, and infection), Communication (e.g., talking, asked, and calling), Money (e.g., tax, money, and bills), God and Religion (e.g., god, religion, praying), Slang (e.g., omg, cuz, lol), Questioning (e.g., why, questions, debating), Political and Social Issues (e.g., rights, politics, freedom), and Reflection and Insight (e.g., discovered, learned, lessons). 36 LIWC categories were negative associated with happiness; for brevity we note that negative associations with Negative Emotion, Anger, Body, Swear, Sad, Anxious, Health, Informal Language, Nonfluencies, and Biological Processes conceptually replicate the LDA results. In terms of grammar, Negations (e.g., haven’t, didn’t, and wasn’t), 2nd Person Pronouns (e.g., you’re, you’ll, and you’ve) and 3rd Person (e.g., they, they’re, themselves) were negatively associated with happiness. The LIWC categories for Negations, Total Pronoun usage, and 2nd Person pronouns were also negatively associated with happiness, but 3rd Person singular and plural was not. There were also a non-trivial amount of essentially miscellaneous topics [31] that were not clearly interpretable in terms of a broader category, but still were significantly associated with happiness. We illustrate the content of the miscellaneous topics that had the highest correlations with happiness and meaning for context in Tables 5 and 6.
All of the topics positively associated with the meaning subscale (Table 6) were redundant with the themes identified in the happiness factor: Leisure and Entertainment, Family, Celebration and Gathering, Gratitude, God and Religion, Education, Positive Affect, Travel, and Geography. The LIWC categories that were positively associated with meaning were also redundant with the happiness associations (Table 7).
Like the happiness factor, many of the negative topic correlations with meaning could be labeled under the themes of Negative Affect, Hostility, Health, Body, Slang, Discussion of Emotions/Mixed Emotions, and Negation. Additionally, we categorized topics dealing with Sleep (e.g., slept, bed, and tired), Transportation (e.g., drive, car, and bus), and Communication (e.g., why, wondering, and question). Notably, several Leisure and Entertainment (e.g., comic, disney, and movie) topics were also negatively correlated with meaning. The use of 3rd Person Plural Pronouns and 2nd Person Pronouns was also negatively associated with meaning. The LIWC Total Pronouns and 3rd Person Plural categories were also negatively associated with meaning, but the 2nd Person category was not. All of the LIWC categories negatively associated with meaning were also negatively associated with happiness.
We synthesized the positive topic correlations with psychological richness into the themes of Reflection and Insight (e.g., discovered, learned, and lesson), Political and Social Issues, Negative Affect, God and Religion, Interpersonal Topics (e.g., others, human, and families), and Ideas and Concepts (e.g., intelligent, creative, thoughts, and research) (Table 8). It is notable that topics mentioning Political and Social Issues were positively associated with psychological richness, while the same topics were typically negatively associated with happiness. It is also notable that topics mentioning Negative Affect were positively associated with psychological richness, while the same topics were typically negatively associated with happiness and meaning. The LIWC category of Anxiety was also positively associated with psychological richness (r = .067, p = .006), verifying this observation. Ideas and Concepts was a unique theme that characterized topic correlations that did not occur with the other constructs.
These divergent results provide some ecologically valid support for the notion that the construct of psychological richness may characterize thoughts, feelings, and behaviors that are substantially distinct from the other facets of the good life. The LIWC categories of Conjunctions, Health, Hearing, Total Pronouns, Personal Pronouns, Certainty, and Interrogatives were also positively associated with psychological richness (Table 9).
Among the negative correlations, the only theme that characterized three or more topics was Leisure and Entertainment (e.g., xbox, football, and movie). This was verified by a negative correlation (r = −.069, p = .005) with the LIWC Leisure topic. The individual Leisure and Entertainment topics for psychological richness had positive or null relationships with happiness, and null relationships with meaning (with the exception of one negative meaning association; see SB4 in S2 File).
Discussion
In the present research, we found that language-based assessments of social media text data only added significant incremental predictive validity above and beyond demographic features for the prediction of psychological richness, but not for hedonic well-being or eudaimonic well-being. At the same time, psychological richness had the smallest relative predictability in all models and had notably fewer LIWC and LDA topic correlations compared to the other constructs. These data suggest that while language features can aid in the prediction of psychological richness, the overall predictability is still relatively low compared to other well-being constructs. One explanation for this result is that psychological richness is relatively low in observability and produces fewer trait-relevant cues that can be utilized by observers [42]. However, because previous research indicates that trait-relevant cues are available to inform accurate informant reports [24], it may also be the case that these cues are less recoverable with social media text data. The use of social media may not provide as many opportunities for people to behave in ways that are reflective of their standing on self-perceived psychological richness compared to hedonic and eudaimonic well-being. General aspects of using social media — or specific aspects of the Facebook platform [15] — may create an environment that encourages people to disclose positive experiences, while sharing more complex and ambiguous experiences is less common. Features of the platform may make psychological richness particularly hard to observe, whereas it may be more observable from text data generated in other environments and contexts.
The prediction coefficients for language-only models were all below the meta-analytic r of.33 for the prediction of individual-level well-being scores from social media text data with similar open and closed vocabulary methods [43]. A number of factors, such as the nature of the sample, the psychometric properties of the Good Life Scale instrument, and the nature of the trait’s observability and expression in the social media platform context, potentially contributed to the prediction performance.
Diener and Seligman [2] emphasized the importance of measuring different components of well-being beyond life satisfaction so that decision-makers can have a fully informed understanding of population-level well-being outcomes. Indeed, more governments and intergovernmental organizations recognize the need to supplement traditional economic indicators with well-being indicators [44]. Yet, the use of population surveys to obtain well-being indices is costly and time-consuming. From a population perspective, the use of social media text mining can potentially provide a real-time approach to assess distinct and established components of well-being in the population. This can serve to guide policy decisions as policymakers seek to incorporate and evaluate policy impact through well-being outcomes [45]. While the prediction coefficients for language-only and demographics-only models were similar, the added value of social media text data was only clear in the case of psychological richness, which was also the least predictable within each model compared to happiness and meaning. Advances in both machine learning and accurate psychological measurement are likely required for large-scale, passive assessments to reach their full potential. It may be that while individual-level prediction is still limited, regional prediction of well-being levels will soon be accurate enough to begin to inform policy decisions (meta-analytic r = .54; [43].
Another aim of this research was to learn more about the everyday behaviors that are associated with psychological richness, happiness, and meaning. The associations between the LDA topics and the GLS scales provide content and discriminant validity evidence for the language-based assessment of these well-being constructs, which is an essential step in validating a novel technology-based assessment method [46]. The association between psychological richness and the Reflection and Insight, Ideas and Concepts, Political and Social Issues, and Negative Affect topics can be interpreted as indicators of the processing of emotions, experiences, and ideas, which is consistent with the notion that psychological richness involves complex — but not necessarily enjoyable — experiences. Some of the Negative Affect and Political and Social Issues topics were positively associated with psychological richness while being negatively associated with happiness, which provided some discriminant validity evidence based on linguistic behavior. The only pattern among the topics that negatively correlated with psychological richness was that many referenced Leisure and Entertainment. The content in these topics referenced a variety of entertainment and leisure activities that are commonly understood as entertaining or enjoyable, and were positively associated with the happiness scale. Perhaps people who perceive their lives as less psychologically rich and interesting are more likely to be immersed in simple, familiar pleasures like playing video games or watching sports and movies. The effect sizes were incredibly small, however, and further multi-method evidence is needed to verify this observation.
The topics positively correlated with the happiness factor were face valid and in line with previous research on language-based prediction of life satisfaction [13]. The Family and Celebration and Gathering topics can be interpreted in terms of the extensive literature on the positive association between social connection and well-being [47], while the God and Religion topics can be interpreted in reference to the literature on the positive correlation between religiosity/spirituality and well-being [48]. While many of the negative topic correlations with happiness and meaning were face valid (e.g., Negative Affect, Hostility, Health, Body, Discussion of Emotions) in reference to lay definitions of the constructs, there were also themes that were less straightforwardly interpretable (e.g., Time, Sleep, Money, Transportation, Communication, Questioning), or had primarily grammatical content (e.g., Negation, Slang, 2nd Person and 3rd Person Pronouns).
The substantial content overlap between the topics positively associated with the happiness and meaning factors can be most straightforwardly explained by the strong, positive correlation between the scales (r = .71) and the lack of good fit in the confirmatory factor analysis. A major limitation of the present research was the poor model fit of the Good Life Scale. Future research should investigate whether the predictability of distinct well-being constructs from social media text data differs for instruments with better-differentiated multidimensionality. These results highlight the importance of measurement model fit and item content in determining the predictability from language, and interpretability of the associated language features. Future research that operationalizes the constructs of hedonic life satisfaction, eudaimonic well-being, and psychological richness with more comprehensive and distinct measures may replicate some of our results but is likely to differ in meaningful ways.
Conclusion
We found that the recently proposed third component of well-being, psychological richness, could be predicted through language-based assessment methods, and that social media text data incrementally improved the prediction accuracy. However, the measures of happiness and meaning were just as predictable from demographics as they were from social media text data. These results highlight the central role that internal structure validity and item content may play in language-based assessments of psychological tests. Our results also provide additional discriminant validity evidence for the distinctiveness of the psychological richness construct by illustrating how its associations with everyday behaviors differed from the associations for the happiness and meaning scales. Future work should examine the degree to which the accuracy of LBA assessment of distinct components of well-being can be improved. Assessments with improved accuracy and discriminant validity may eventually be useful for understanding how distinct components of well-being are associated with policy changes and socio-economic conditions over time.
Supporting information
S1 File.
Table SA1. Factor Loadings for Orthogonal 3-factor CFA of the Good Life Scale. Table SA2. Factor Loadings for Orthogonal Bifactor CFA of the Good Life Scale.
https://doi.org/10.1371/journal.pone.0337649.s001
(DOCX)
S2 File.
Table SB1. LIWC Correlations. Table SB2. LDA Correlations. Table SB3. T-test Model Comparisons. Table SB4. All LDA Themes. Table SB5. Abbreviated LDA Themes.
https://doi.org/10.1371/journal.pone.0337649.s002
(XLSX)
References
- 1.
De Neve JE, Diener E, Tay L, Xuereb C. The Objective Benefits of Subjective Well-Being [Internet]. Rochester, NY: Social Science Research Network. 2013 [cited 2021 Jun 7]. Available from: https://papers.ssrn.com/abstract=2306651
- 2. Diener E, Seligman MEP. Beyond money: toward an economy of well-being. Psychol Sci Public Interest. 2004;5(1):1–31.
- 3. Diener E, Diener M, Diener C. Factors Predicting the Subjective Well-Being of Nations. Culture and Well-Being: The Collected Works of Ed Diener (Social Indicators Research Series). Dordrecht: Springer Netherlands; 2009. pp. 43–70. [cited 2021 Jun 21]. Available from:
- 4.
Kohut A, Keeter S, Doherty C, Dimock M, Christian L. Assessing the representativeness of public opinion surveys. Washington, DC: Pew Research Center. 2012.
- 5. Chen L, Gong T, Kosinski M, Stillwell D, Davidson RL. Building a profile of subjective well-being for social media users. PLoS One. 2017;12(11):e0187278. pmid:29135991
- 6. Yang C, Srinivasan P. Life Satisfaction and the Pursuit of Happiness on Twitter. PLoS One. 2016;11(3):e0150881. pmid:26982323
- 7. Youyou W, Kosinski M, Stillwell D. Computer-based personality judgments are more accurate than those made by humans. Proc Natl Acad Sci U S A. 2015;112(4):1036–40. pmid:25583507
- 8.
Kramer ADI. An unobtrusive behavioral model of “gross national happiness.” In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems [Internet]. New York, NY, USA: Association for Computing Machinery; 2010. pp. 287–90. [cited 2020 Oct 5]. Available from: http://doi.org/10.1145/1753326.1753369
- 9. Mitchell L, Frank MR, Harris KD, Dodds PS, Danforth CM. The geography of happiness: connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PLOS ONE. 2013;8(5):e64417.
- 10. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One. 2013;8(9):e73791. pmid:24086296
- 11. Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM. Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS One. 2011;6(12):e26752. pmid:22163266
- 12. Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M, Stillwell DJ, et al. Automatic personality assessment through social media language. J Pers Soc Psychol. 2015;108(6):934–52. pmid:25365036
- 13.
Schwartz HA, Sap M, Kern ML, Eichstaedt JC, Kapelner A, Agrawal M, et al. Predicting Individual Well-being Through The Language of Social Media. In: Biocomputing 2016. Kohala Coast, Hawaii, USA: World Scientific; 2016. pp. 516–27. [cited 2020 Oct 28] Available from: http://www.worldscientific.com/doi/abs/10.1142/9789814749411_0047
- 14. Baumeister RF, Vohs KD, Funder DC. Psychology as the science of self-reports and finger movements: whatever happened to actual behavior? Perspect Psychol Sci. 2007;2(4):396–403. pmid:26151975
- 15. Tay L, Woo SE, Hickman L, Saef RM. Psychometric and validity issues in machine learning approaches to personality assessment: A focus on social media text mining. Eur J Personal. 2020;34(5):826–44.
- 16. Bleidorn W, Hopwood CJ. Using machine learning to advance personality assessment and theory. Pers Soc Psychol Rev. 2019;23(2):190–203. pmid:29792115
- 17.
Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ. The development and psychometric properties of LIWC2007. Austin TX LIWC Net [Internet]. 2007 [cited 2020 Oct 5]; Available from: https://scholars.ttu.edu/en/publications/the-development-and-psychometric-properties-of-liwc2007-5
- 18. Liu P, Tov W, Kosinski M, Stillwell DJ, Qiu L. Do Facebook Status Updates Reflect Subjective Well-Being? Cyberpsychol Behav Soc Netw. 2015;18(7):373–9. pmid:26167835
- 19. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3(Jan):993–1022.
- 20. Ryan RM, Deci EL. On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. Annu Rev Psychol. 2001;52:141–66. pmid:11148302
- 21. Besser LL, Oishi S. The psychologically rich life. Philos Psychol. 2020;33(8):1053–71.
- 22. Oishi S, Westgate EC. A psychologically rich life: beyond happiness and meaning. Psychol Rev. 2022;129(4):790–811. pmid:34383524
- 23. Oishi S, Choi H, Koo M, Galinha I, Ishii K, Komiya A, et al. Happiness, meaning, and psychological richness. Affect Sci. 2020;1(2):107–15.
- 24. Oishi S, Choi H, Buttrick N, Heintzelman SJ, Kushlev K, Westgate EC. The psychologically rich life questionnaire. J Res Personal. 2019;81:257–70.
- 25. Oishi S, Choi H, Liu A, Kurtz J. Experiences associated with psychological richness. Eur J Personal. 2020.
- 26.
Sap M, Park G, Eichstaedt J, Kern M, Stillwell D, Kosinski M, et al. Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. pp. 1146–51.
- 27. Gosling SD, Rentfrow PJ, Swann Jr WB. A very brief measure of the Big-Five personality domains. J Res Personal. 2003;37(6):504–28.
- 28. Rosseel Y. lavaan: An R package for structural equation modeling. J Stat Softw. 2012;48:1–36.
- 29.
Jorgensen TD, Pornprasertmanit S, Schoemann AM, Rosseel Y, Miller P, Quick C, et al. semTools: Useful Tools for Structural Equation Modeling [Internet]. 2025 [cited 2025 Oct 14]. Available from: https://cran.r-project.org/web/packages/semTools/index.html
- 30.
Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The Development and Psychometric Properties of LIWC2015. 2015 [cited 2020 Oct 5]; Available from: https://repositories.lib.utexas.edu/handle/2152/31333
- 31.
Schwartz HA, Giorgi S, Sap M, Crutchley P, Ungar L, Eichstaedt J. DLATK: Differential Language Analysis ToolKit. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations [Internet]. Copenhagen, Denmark: Association for Computational Linguistics; 2017. pp. 55–60. [cited 2023 Mar 6] Available from: https://aclanthology.org/D17-2010
- 32. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. pmid:32015543
- 33. Diedenhofen B, Musch J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS One. 2015;10(3):e0121945. pmid:25835001
- 34. Ferreira JA, Zwinderman AH. On the Benjamini–Hochberg method. Ann Stat. 2006;34(4):1827–49.
- 35. Faul F, Erdfelder E, Lang A-G, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39(2):175–91. pmid:17695343
- 36. Funder DC, Ozer DJ. Evaluating effect size in psychological research: sense and nonsense. Adv Methods Pract Psychol Sci. 2019;2(2):156–68.
- 37. Pang D, Eichstaedt JC, Buffone A, Slaff B, Ruch W, Ungar LH. The language of character strengths: Predicting morally valued traits on social media. J Pers. 2020;88(2):287–306. pmid:31107975
- 38. Ferguson CJ, Heene M. Providing a lower-bound estimate for psychology’s “crud factor”: The case of aggression. Prof Psychol Res Pract. 2021;52(6):620–6.
- 39. Bogdan PC. One decade into the replication crisis, how have psychological results changed? Adv Methods Pract Psychol Sci. 2025;8(2).
- 40. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model Multidiscip J. 1999;6(1):1–55.
- 41. Flora DB. Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Adv Methods Pract Psychol Sci. 2020;3(4):484–501.
- 42. Vazire S. Who knows what about a person? The self-other knowledge asymmetry (SOKA) model. J Pers Soc Psychol. 2010;98(2):281–300. pmid:20085401
- 43. Sametoğlu S, Pelt DHM, Eichstaedt JC, Ungar LH, Bartels M. The value of social media language for the assessment of wellbeing: A systematic review and meta-analysis. J Posit Psychol. 2024;19(3):471–89.
- 44.
OECD. Improving well-being [Internet]. Vol. 2014. Paris: OECD; 2014. pp. 55–82. [cited 2023 Feb 22] Available from: https://www.oecd-ilibrary.org/economics/oecd-economic-surveys-united-states-2014/improving-well-being_eco_surveys-usa-2014-5-en
- 45. Odermatt R, Stutzer A. Subjective Well-Being and Public Policy. SSRN Electron J. 2017.
- 46. Liou G, Bonner CV, Tay L. A psychometric view of technology-based assessments. Int J Test. 2022;22(3–4):216–42.
- 47. Tay L, Tan K, Diener E, Gonzalez E. Social relations, health behaviors, and health outcomes: A survey and synthesis. Appl Psychol Health Well-Being. 2013;5(1):28–78.
- 48. Yaden DB, Batz-Barbarich CL, Ng V, Vaziri H, Gladstone JN, Pawelski JO, et al. A Meta-Analysis of Religion/Spirituality and Life Satisfaction. J Happiness Stud. 2022;23(8):4147–63.