Skip to main content
Advertisement
  • Loading metrics

Contribution of social determinants to symptoms of generalized anxiety disorder

Abstract

Symptoms of anxiety are known to be triggered by a range of life context factors including early life trauma, poor sleep quality, infrequent exercise, unemployment and social isolation. Machine learning techniques offer a powerful method for analyzing these factors in combination, enabling the evaluation of aggregate predictive associations rather than causal pathways and the identification of their relative association with anxiety symptoms. However, most studies examining these factors have either been small-scale or included only a small number of factors. Here we applied multiple machine learning approaches (Random Forest, Gradient Boosting, Naïve Bayes, Information Gain, and SHAP) to a cross-sectional data sample of 4,186 individuals to reveal how a broad range of lifestyle and life context factors are associated with the experience of anxiety symptoms, as measured by the Generalized Anxiety Disorder-7 screening questionnaire (GAD-7). The results showed that, in combination, early life trauma, poor sleep quality, infrequent exercise, unemployment, and deterioration of social bonds were substantially associated with anxiety symptoms, particularly for older age groups, with frequency of a good night’s sleep having an outsized impact. For older ages, this was followed by employment status and experience of interpersonal trauma, as well as frequency of in-person socializing. For younger ages (18–34), employment status was less important with interpersonal trauma being a more significant factor. Specifically, poor sleep, rarely socializing in person, not being able to work or being unemployed, bullying by peers, or neglect/abuse by a parent or caregiver had the largest associations with anxiety symptoms. These findings have implications for how we approach both prevention and treatment of anxiety.

Introduction

Feelings of fear and anxiety, evoked by threats, or the anticipation of real or imagined threats, are natural adaptive responses which facilitate survival. However, when these feelings become excessive in frequency, severity or duration, they can become maladaptive and functionally impairing. Although there are multiple disorders associated with maladaptive anxiety (e.g., phobias, panic, social anxiety), Generalized Anxiety Disorder (GAD) is among the most prevalent and clinically significant anxiety-related conditions, and is described by the DSM-5 as ‘excessive worry and apprehensive expectations, occurring more days than not for at least 6 months, about a number of events or activities, such as work or school performance’ [1]. Within a primary care or therapeutic setting, it is often initially screened for using the General Anxiety Disorder-7 questionnaire (GAD-7) which includes 7 questions on worry, fear, restlessness, irritability, and where respondents rate each item according to how frequently they have been bothered by the problem over the last two weeks [2]. A sum score threshold of 10 on the GAD-7 (moderate anxiety) has been shown to have a sensitivity of 89% and a specificity of 82% for GAD, as determined by telephone interview with a mental health professional [2].

Although there are various treatments available to help reduce or manage symptoms of anxiety (e.g., cognitive behavioural therapy, psychoactive medications), its high and increasing prevalence in the general population [36], debilitating life impact [7,8] and low treatment availability [9], means there is a need to better understand underlying risk factors such that incidence can be preventatively reduced. However, the aetiology and risk profile of anxiety is complex and still poorly understood. Although family studies have revealed a genetic component [10,11], people’s environment, life experiences and lifestyle habits (together, social determinants) all play a prominent role in the onset and trajectory of the anxiety symptoms. For example, a range of factors including early life trauma [1214], poor sleep [15,16], unhealthy diet [17,18], infrequent exercise [19,20], unemployment [21,22] and social isolation [23,24] have all been associated with anxiety symptoms, often bidirectionally.

Machine learning techniques offer a powerful method to analyze these factors in combination to predict outcomes and identify their relative association with anxiety symptoms [see [2529] for reviews]. However, to date, most machine learning studies in this context have either been small scale (typically a few hundred people) [3034], only included a limited number of factors (e.g., demographics and medical history) [35,36], and/or focused on specific populations (e.g., a specific country, age group or clinical population) [31,3446]. As a result, it is currently unknown how well anxiety symptoms can be predicted by lifestyle and life circumstance factors in the aggregate, and which ones are more important predictors of anxiety symptoms in the general population. This has implications not only for identifying at risk populations and directing health public policy, but also in terms of how anxiety symptoms can be prevented and how they should be treated from a therapeutic perspective [4749].

Here we take advantage of a large-scale cross-sectional data collection effort that asked a subsample of respondents from the general population (N = 4,186) to complete the GAD-7 screening questionnaire, and included responses on a rich array of lifestyle and life experience factors. In this study, individuals, taken from a general population, completed the GAD-7 questionnaire and answered questions that described their lifestyle, life circumstance, and experience of various adversities and interpersonal traumas. We applied multiple machine learning approaches (Random Forest, Gradient Boosting, Naïve Bayes, Information Gain and SHAP) to this data sample to determine the degree to which these lifestyle and life experience factors, in the aggregate, were associated with GAD-7 outcomes in terms of statistical classification of symptom severity (rather than prospective prediction), and to describe the relative importance of these different factors. These findings have implications not only for identifying at risk populations and directing health public policy, but also in terms of how anxiety can be prevented and how it should be treated from a therapeutic perspective [4749].

Methods

Data acquisition and data elements

Participants were recruited as part of the ongoing Global Mind Project through online advertisements placed on Facebook and Google that targeted age-sex groups and geographical regions across broad based interests and key words [50]. The recruitment and data quality procedures have been previously described in detail [59]. Respondents were aged 18–85, predominantly from 20 countries, with 45.7% of respondents reporting their biological sex as male (see Table A in S1 Text for age and sex break-up and Table B in S1 Text for percentage of respondents by country). Participants were directed to the survey website (https://sapienlabs.org/mhq/) and completed the GAD-7 questions as part of a larger assessment of mental wellbeing [51,52]. Respondents also answered questions on a broad range of life context factors including lifestyle habits, life circumstances and experience of various adversities and interpersonal traumas (see Table 1).

thumbnail
Table 1. Life context factors queried in the survey.

https://doi.org/10.1371/journal.pmen.0000552.t001

Cross-sectional data was collected between 14/09/2022 and 29/09/2022, during which 4,421 respondents completed the GAD-7. Standard Global Mind Project cleaning criteria were applied to the data. Only respondents who responded ‘Yes’ to the MHQ question ‘Did you find this assessment easy to understand?’, and who had a standard deviation of >0.2 across MHQ rated question responses were included in the analysis, leading to a final sample size of 4,186. This criterion, part of the validated MHQ quality-control framework [51,59], uses within-respondent response variability to identify disengaged or inattentive respondents; SD < 0.2 across 47 MHQ items indicates extremely low variability (e.g., selecting the same response for nearly all items), a pattern empirically associated with poor data quality.

Participants took part in the online survey voluntarily, anonymously, and without any financial compensation. Participants consented to take part by clicking on a start button after reading a detailed privacy policy. Cases with missing data on predictor variables were excluded from analysis on a listwise basis. All procedures involving human subjects were approved by the Health Media Lab Institutional Review Board (HML IRB; OHRP Institutional Review Board #00001211, Federal Wide Assurance #00001102, IORG #0000850).

Calculation of GAD-7 Sum Scores

No post-stratification weights were applied to the sample, as the study’s primary purpose was model performance evaluation rather than population prevalence estimation. Each GAD-7 item was rated on a frequency scale of 0–3 that reflected how much a symptom had bothered them over the last 2 weeks (0 = Not at all; 1 = Several days; 2 = More than half the days; 3 = Nearly every day). The sum of these ratings, the GAD-7 sum score, was computed for each respondent, and the proportion of respondents within each category (Minimal anxiety = 0–4; Mild anxiety = 5–9; Moderate anxiety = 10–14; Severe anxiety=15) was calculated. GAD-7 sum scores in this data spanned a full range from 0 to 21 with 24.1% having scores of 10 or higher (Fig 1).

thumbnail
Fig 1. Histogram of GAD-7 sum scores in the sample.

https://doi.org/10.1371/journal.pmen.0000552.g001

Classification models

Multiple classification models were used to identify the model type with the best performance. These included Logistic Regression as well as tree-based models such as Random Forest, Gradient Boosting (XGBoost) and Naïve Bayes using Orange Data Mining, an open-source machine learning and data visualization toolkit designed for data analysis through visual programming or Python scripting (https://orangedatamining.com/). The Logistic Regression model was implemented with L1 (LASSO) regularization (the cost strength C = 12) and class balancing enabled. Table C in S1 Text provides the full model comparison across all classifiers. Models were created for the identification of individuals with GAD-7 scores ≥10 and <10 separately, then combined into a composite Logistic Regression model. This approach trains separate classifiers on minority and majority classes to mitigate imbalanced-data bias and is an overfitting-mitigation strategy rather than a method for imputing missing outcome data. All features were one-hot encoded where each answer option, if selected, was coded as a 1 and if not selected, as a 0.

Performance metrics including ROC area under the curve (AUC), accuracy, precision, recall and F1 scores were computed. This was done using all the data together, as well as separating the data by both geography and age. Geographically, models were built separately for all data acquired from western/developed countries (N = 7 countries; 40% of sample) and from non-western/developing countries (N = 13 countries; 60% of sample). Similarly, models were built for each decadal age group, pooling all geographies. Results reported are based on a 5-fold stratified cross-validation.

Estimation of contribution of variable categories to model performance

Using Logistic Regression models, the contribution of each category of lifestyle or life experience factor to the model performance in predicting symptoms of moderate to severe anxiety, defined as GAD-7 scores ≥10, was evaluated. These included frequency of exercise, frequency of socializing, employment status, number of interpersonal traumas, number of financial adversities, number of other life adversities and substance use. For each feature category (e.g., all exercise frequencies for the frequency of exercise question), the increase in each performance metric was evaluated at different positions of forward addition, when added first to a base model that included biological sex, and also when added last after all other feature categories/features had been included.

Information gain

Information Gain was used to assess the contribution of each feature (i.e., one hot encoded option) computed as the reduction in binary entropy of a target variable when it is split based on a particular feature [53]. For feature categories, aggregate information gain values were computed by averaging the individual Information Gain values for each feature/answer option within the category.

SHAP

We used the SHAP method to compute Shapley values, to assess how specific features affected prediction outcomes. Briefly, the marginal contribution of a feature (a one-hot encoded option of a factor such as exercise) was computed for each grouping as the difference in the predicted outcome with and without the feature. The Shapley value was the (weighted) average of marginal contributions, providing a view of both the magnitude and direction of each feature’s contribution [54].

Results

Prediction of moderate to severe anxiety by lifestyle and life circumstance

Here we used multiple models (Logistic Regression, Random Forest, Gradient Boosting and Naïve Bayes), to determine how well, in aggregate, multiple lifestyle and life experience factors predicted symptoms of moderate to severe anxiety, defined as a GAD-7 score of 10 or higher. Given that only 24% of the sample reported symptoms moderate to severe anxiety, models that classified high GAD-7 scores tended to overfit in the training while models that predicted the converse, tended to over generalize. The tree-based models, including Gradient Boosting and Random Forest, tended to overfit the majority class (GAD-7 < 10; Table C in S1 Text). This is evidenced by their high precision (0.82 and 0.86, respectively) and F1 scores (0.88 and 0.87, respectively) for this class, but considerably lower performance on the minority class (GAD-7 > 10), with lower recall (0.25 and 0.47, respectively) and F1 scores (0.33 and 0.49, respectively). When faced with imbalanced datasets (i.e., unequal class distributions, where only 24% of the sample had GAD-7 scores ≥10), this overfitting to the majority class is characteristic of decision tree-based algorithms, as they tend to create overly complex models that capture noise in the majority class. Naive Bayes showed similar, though slightly poorer performance compared to tree-based models. Logistic Regression, while not achieving the absolute highest scores in any single metric, demonstrated the most balanced performance across both classes, as evidenced by its superior AUC (0.80) and F1-score (0.53) for the GAD-7 > 10 class. This balanced performance occurs because tree-based models often maximize overall accuracy by correctly classifying the majority class at the expense of minority class detection, whereas Logistic Regression’s linear decision boundary provides more stable probability estimates across classes, making it the most suitable model for this study’s objective of identifying individuals with moderate to severe anxiety symptoms. Aggregate performance was similar even when models were created separately for western developed and non-western developing countries (Table 2, bottom rows), with AUC scores of 0.81 and 0.77 and F1-scores of 0.76 and 0.74, respectively. Therefore, Logistic Regression models combining all geographic regions (i.e., both western/developed and non-western/developing countries) were used for further analysis.

thumbnail
Table 2. Composite Logistic Regression performance for classification of moderate to severe anxiety symptoms.

https://doi.org/10.1371/journal.pmen.0000552.t002

However, model performance, indicating the ability to classify moderate to severe anxiety symptoms based on lifestyle and life experience factors, was systematically and significantly lower for younger age groups relative to older age groups (Fig 2, Table 3).

thumbnail
Table 3. Logistic regression model performance by age group.

https://doi.org/10.1371/journal.pmen.0000552.t003

thumbnail
Fig 2. Accuracy and F1 scores for the classification of moderate to severe anxiety symptoms using Logistic Regression showed better performance for older age groups.

https://doi.org/10.1371/journal.pmen.0000552.g002

Accuracy and F1 scores were 0.77 and 0.86, respectively, for those age 65 and older but only 0.67 and 0.68 for those under age 34, while performance was in between for the middle age groups. This suggests that while older age groups were more likely to experience anxiety symptoms associated with the lifestyle and adverse life circumstances captured here, younger age groups were increasingly likely to experience anxiety symptoms associated with other factors not included in this model.

Hierarchy of factors contributing to model performance

Many factors contributing to anxiety may be inter-related. For example, one might sleep worse if not exercising or if experiencing interpersonal trauma, or one might be more likely to use substances such as tobacco or alcohol when unemployed. We therefore examined the impact of adding lifestyle and life experiences factors on model performance (AUC and F1 scores) when they were added either first or last (Fig 3A, Table 4, Table D in S1 Text When added first this indicates the contribution of the factor inclusive of its interactions and correlations with other factors, while adding it last provides insight into the contribution of the factor independent of its interactions and correlations with other factors.

thumbnail
Table 4. Ranking of impact of factors on model performance (AUC and F1 scores) based on first or last inclusion in forward addition models.

https://doi.org/10.1371/journal.pmen.0000552.t004

thumbnail
Fig 3. Hierarchy of factors contributing to the classification of moderate to severe anxiety symptoms.

(A) Impact to AUC of adding the factor last after all other factors for all ages and 18 to 34. (B) Information Gain of factors. Legend spans both A and B.

https://doi.org/10.1371/journal.pmen.0000552.g003

We performed this both for all ages together and for the 18–34 age group separately. Contributions of all factors diminished substantially when added last compared to when added first due to inter-relationships between factors for all ages and for the18–34 age group alone. For all age groups together, frequency of good sleep contributed the most to both AUC and F1 scores (0.163 and 0.041, respectively), while employment status had the second highest impact (0.108 and 0.017, respectively) followed by the experience of interpersonal trauma, frequency of social interaction and frequency of exercise. Educational attainment, substance use, financial adversities, and other adversities ranked lower, with no impact when added last. Similarly, frequency of good sleep also ranked highest in its contribution to AUC and F1 scores overall (average rank of adding first and last) for the 18–34 age group alone. However, employment status had little impact (ranked 6th) while the experience of interpersonal trauma ranked higher (2nd). Frequency of exercise and frequency of social interaction followed, jointly ranking 4th. The experience of other adversities (i.e., not financial or interpersonal, such as illness, injury, or natural disasters) also ranked 4th when added first, but had no impact when added last, similar to financial adversities and educational attainment.

Hierarchy of factors using Information Gain

We similarly evaluated the hierarchy of factors associated with moderate to severe anxiety symptoms for all age groups together, and for younger ages 18–34 separately, using Information Gain, a model independent method (Fig 3B). Here again the results were consistent with the impact to the model as described above. In particular, the feature category of sleep dominated as the most important factor associated with moderate to severe anxiety symptoms across all age groups, followed by employment status, frequency of socializing and the experience of interpersonal trauma. Similarly, for the 18–24 age group alone, sleep was the most important factor, followed by the experience of interpersonal trauma, while employment status had a lesser impact.

The Information Gain values of each feature for all ages are shown in Table E in S1 Text. Getting a good night’s sleep ‘Hardly ever’ or ‘Most of the time’ had the top two highest Information Gain values while ‘Rarely/Never’ exercising or socializing and being ‘Retired’ had the next highest. Bullying by peers and parental abuse or neglect in childhood as well as being ‘Not able to work’ or ‘Unemployed’, other frequencies of getting a good night’s sleep and abuse or assault in childhood were also in the top 15 features.

Factor contribution using SHAP

Finally, we used SHAP (SHapley Additive exPlanations) as a qualitative tool to highlight the directionality and consistency of associations, rather than as a definitive measure of causal importance (Fig 4). We show the direction of impact of the top 4 factors across all ages (frequency of getting a good night’s sleep, employment status, frequency of socializing and frequency of exercising). It is important to note that when predictors are correlated, SHAP values share importance estimates across correlated predictors because they summarize contributions across all feature combination scenarios; however, the consistency of rankings across SHAP, Information Gain, and forward-selection analyses strengthens confidence in the identified hierarchy. Here, each individual was plotted as a point either in blue or red, where blue indicates that the option was not selected whereas red indicates it was selected. Values to the right of zero indicate how much it pushed the model towards a positive classification of moderate to severe anxiety symptoms while values to the left of zero indicate how much it pushed the model towards a negative classification. Selection of ‘Hardly ever’ having a good night’s sleep consistently and substantially contributed towards a positive classification of moderate to severe anxiety symptoms, while having a good night’s sleep only ‘Some of the time’ also contributed towards a positive classification, but to a lesser extent. In contrast, having a good night’s sleep ‘Most of the time’ or ‘All of the time’ contributed to a negative classification. Similarly, ‘Rarely/never’ socializing in person contributed to a positive classification while socializing at least 1–3 times a month or more contributed to a positive classification. Finally, having an employment status of ‘Not able to work’ contributed most strongly to a positive classification of moderate to severe anxiety symptoms followed by a status of ‘Homemaker’ and ‘Unemployed’. In contrast, being ‘Employed’, ‘Retired’ or ‘Studying’ contributed strongly to a negative classification.

thumbnail
Fig 4. SHAP values for each category of sleep frequency, socializing frequency and employment status.

A red dot represents individuals who selected the option, while a blue dot indicates individuals who did not select the option. Points to the right of 0 indicate a contribution to positive classification of moderate to severe anxiety symptoms, points to the left of 0 indicate a contribution to a negative classification.

https://doi.org/10.1371/journal.pmen.0000552.g004

Discussion

Here we show that lifestyle and life context factors, particularly sleep quality, frequency of exercise, social interaction, and interpersonal trauma, play a substantial role in predicting moderate to severe anxiety symptoms, defined here as GAD-7 scores of 10 and above. The machine learning models employed here also revealed the hierarchy across these factors, with sleep quality being the most prominent across all age groups. Additionally, the influence of lifestyle factors on anxiety symptoms differs across age groups. These findings build on existing machine learning studies that, to date, have been smaller in scale [3034] or scope [31,3440,4245] and are a first demonstration of the aggregate contribution of a large number of adversities and traumas together with lifestyle factors to the incidence of anxiety symptoms across a large-scale sample from the general population. Altogether, they highlight the complex interplay and hierarchy of factors associated with the experience of anxiety symptoms and have implications for targeted interventions and public health policies aimed at preventing and treating anxiety.

Model performance and differences by age

The lifestyle and life context factors used in this study had substantial predictive power for older age groups. However, their predictive power was systematically diminished for younger age groups. While this study included a broad range of adversities, traumas, and lifestyle factors, some factors that impact mental health outcomes such as diet [e.g., ultra-processed food, [55]] and social media use [56] were not included and could be substantial contributors in younger age groups. In addition, studies are increasingly showing an impact of environmental toxins on mental health [57,58] which could also play a role and remains to be studied. However, importantly, it suggests that the factors associated with symptoms of anxiety differ between older and younger generations, which has implications for how we approach both prevention and treatment. While this finding may appear counterintuitive given that adverse events are commonly reported among younger populations, the specific constellation of lifestyle and life context factors captured here—many representing cumulative life experiences such as employment history and interpersonal traumas—may be more salient risk markers for anxiety in older adults, whereas younger adults may be more affected by contemporary factors not included in this model. This pattern is consistent with our previous MHQ study [59], where model performance also improved systematically with age (accuracy 0.68 for ages 18–24 vs. 0.94 for ages 75–84). We also note that this generational trend is also true for the prediction of transdiagnostic mental distress [59].

Hierarchy of life context factors driving anxiety

We used multiple methods to identify the hierarchy of life context factors associated with symptoms of moderate to severe anxiety. Across multiple methods we showed consistently that sleep status, and in particular frequent poor sleep, dominated the classification of moderate to severe anxiety symptoms for both age groups. For older age groups this was followed by employment status (for older ages), frequency of social interaction, number of interpersonal traumas, and frequency of exercise. In particular, for the older age group, not being able to work and being unemployed contributed most substantially to a classification of moderate to severe anxiety symptoms, while being employed or retired had the opposite effect. The experience of interpersonal trauma was a more predictive factor for the 18–34 age group, while employment status was a weaker factor. Among the individual interpersonal traumas, bullying by peers and abuse or neglect by a parent or caregiver in childhood were the strongest predictive factors.

However, although sleep quality was the most significant predictor of moderate to severe anxiety symptoms, even without the inclusion of this factor, other life factors such as social interaction, exercise, employment status and experience of interpersonal trauma, in aggregate, predicted moderate or severe anxiety symptoms with an AUC of 0.75 and F1 score of 0.73. This suggests firstly, that anxiety symptoms are predictably associated with people’s lifestyle habits (i.e., rarely/never socializing or exercising) and certain types of adverse life experiences (i.e., life challenges such as not being able to work, unemployment, and the experience of various types of abuse or assault) and secondly, that these factors likely contribute to poor sleep with reciprocal feedback [6062]. Although this study was cross-sectional in design and therefore cannot categorically distinguish between causality and consequence of symptoms, these findings point to the substantial sociological basis of anxiety symptoms that could be prevented and mitigated through shifts in culture and economics.

Similarity of factors to transdiagnostic predictions

Anxiety is substantially comorbid with various disorders such as depression, obsessive-compulsive disorders (OCD) and panic disorder [63,64] among others. In line with this, we have previously shown that the same set of lifestyle and life context factors were similarly able to predict overall mental distress [59], as measured by the MHQ, a transdiagnostic measure that aggregates across 47 symptoms spanning 10 disorders [65]. That study used a larger sample (N = 270,000) collected between April 2020 and December 2021, whereas the present study uses a distinct sample (N = 4,186) collected in September 2022 with the GAD-7 as the outcome measure. The comparison between disorder-specific (GAD-7) and transdiagnostic (MHQ) prediction provides unique insight into anxiety-specific risk factors. While the top 5 categories of factors were the same across both studies, there are certain differences worth noting. In particular, social interaction was the top contributor to prediction of the MHQ, followed by sleep quality, exercise frequency, employment status and experience of interpersonal trauma. In contrast, sleep quality played a more dominant role in the prediction of moderate to severe anxiety symptoms. This suggests that the specific symptoms of anxiety may be more tied to sleep than other aspects of mental distress.

Strengths and limitations

Key strengths of this study include the wide range of variables studied which integrate lifestyle habits with adverse experiences; the use of multiple algorithms including hierarchical analysis; and the ability to stratify the findings by demographics such as age and geography. However, there are several limitations to note. First, the study is a cross-sectional and therefore cannot distinguish between causality and consequence, especially given the bi-directional nature of many of the factors investigated here. However, this multi-variate data provides a unique opportunity to examine the hierarchy of impact of a broad range of factors on anxiety in a cost-effective and timely fashion and can provide well-evidenced hypotheses for interventional testing. Second, although a wide number of lifestyle and life context factors were used, several key factors, were missing. These factors are now included in more recent iterations of the MHQ and will be included in future analyses. Third, the sample population included in this study, while large-scale and obtained through tailored outreach, was a non-probability sample that may not be representative of the general population. As the assessment is performed online, this is particularly the case in countries where internet penetration is lower. Additionally, all measures were self-reported, which may be subject to recall bias and social desirability effects. Furthermore, while machine learning techniques offer advantages in handling complex, multivariate relationships, they are susceptible to overfitting and may not generalize well to populations with different characteristics from the training sample. However, comparisons of the Global Mind Data from the US showed that it is broadly comparable to data from the US Census. Studies are underway to perform similar comparisons for other countries where equivalent national statistics are available.

Implications for approaches to prevention and treatment

Given the known bidirectional association between anxiety and life context, these findings have implications for how we approach both prevention and treatment of anxiety. From a treatment perspective, it suggests that it is important to first determine an individual’s specific life context before making treatment decisions relating to anxiety. At an individual level, lifestyle factors such as exercise and regular social interaction could substantially reduce the risk of severe anxiety symptoms evoked by adversity. In addition, a better understanding of sleep mechanisms and targeting underlying problems of sleep challenges could also be a possible path to treatment of anxiety symptoms, particularly when found to occur in the absence of obvious adverse events or lifestyle risk factors. For instance, sleep apnea or poor sleep hygiene may drive sleep challenges that in turn cause greater anxiety. Finally, at a population level, age-tailored socioeconomic programs aimed at increasing employment opportunities and reducing interpersonal abuse and assault could substantially decrease the incidence of anxiety.

Supporting information

S1 Text. Table A.

Percentage of respondents by age and biological sex. Table B. Percentage of respondents by country. Table C. Model results for all model types. Table D. Ranking of impact of factors on model performance (AUC and F1 scores) based on first or last inclusion in forward addition models. Table E. InfoGain values of each answer option for all ages.

https://doi.org/10.1371/journal.pmen.0000552.s001

(DOCX)

Acknowledgments

We thank members of the Sapien Labs team for their assistance with recruitment and data management. We are grateful to all survey respondents for their participation in the Global Mind Project.

References

  1. 1. Diagnostic and statistical manual of mental disorders. 5 ed. APA. 2013.
  2. 2. Spitzer RL, Kroenke K, Williams JBW, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. 2006;166(10):1092–7. pmid:16717171
  3. 3. COVID-19 Mental Disorders Collaborators. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic. Lancet. 2021;398(10312):1700–12. pmid:34634250
  4. 4. Szuhany KL, Simon NM. Anxiety Disorders: A Review. JAMA. 2022;328(24):2431–45. pmid:36573969
  5. 5. Wu Y, Li X, Ji X, Ren W, Zhu Y, Chen Z, et al. Trends in the epidemiology of anxiety disorders from 1990 to 2021: A global, regional, and national analysis with a focus on the sociodemographic index. J Affect Disord. 2025;373:166–74. pmid:39732404
  6. 6. Yang X, Fang Y, Chen H, Zhang T, Yin X, Man J, et al. Global, regional and national burden of anxiety disorders from 1990 to 2019: results from the Global Burden of Disease Study 2019. Epidemiol Psychiatr Sci. 2021;30:e36. pmid:33955350
  7. 7. Meier SM, Mattheisen M, Mors O, Mortensen PB, Laursen TM, Penninx BW. Increased mortality among people with anxiety disorders: total population study. Br J Psychiatry. 2016;209(3):216–21. pmid:27388572
  8. 8. Mendlowicz MV, Stein MB. Quality of life in individuals with anxiety disorders. Am J Psychiatry. 2000;157(5):669–82. pmid:10784456
  9. 9. Alonso J, Liu Z, Evans-Lacko S, Sadikova E, Sampson N, Chatterji S, et al. Treatment gap for anxiety disorders is global: Results of the World Mental Health Surveys in 21 countries. Depress Anxiety. 2018;35(3):195–208. pmid:29356216
  10. 10. Meier SM, Deckert J. Genetics of Anxiety Disorders. Curr Psychiatry Rep. 2019;21(3):16. pmid:30826936
  11. 11. Ask H, Cheesman R, Jami ES, Levey DF, Purves KL, Weber H. Genetic contributions to anxiety disorders: where we are and where we are heading. Psychol Med. 2021;51(13):2231–46. pmid:33557968
  12. 12. Chu DA, Williams LM, Harris AWF, Bryant RA, Gatt JM. Early life trauma predicts self-reported levels of depressive and anxiety symptoms in nonclinical community adults: relative contributions of early life stressor types and adult trauma exposure. J Psychiatr Res. 2013;47(1):23–32. pmid:23020924
  13. 13. Juruena MF, Eror F, Cleare AJ, Young AH. The Role of Early Life Stress in HPA Axis and Anxiety. Adv Exp Med Biol. 2020;1191:141–53. pmid:32002927
  14. 14. Liu J, Shi Y, Xie S, Xing L, Wang L, Li W, et al. Meta-analysis of prospective longitudinal cohort studies on the impact of childhood traumas on anxiety disorders. J Affect Disord. 2025;374:443–59. pmid:39824317
  15. 15. Cox RC, Olatunji BO. Sleep in the anxiety-related disorders: A meta-analysis of subjective and objective research. Sleep Med Rev. 2020;51:101282. pmid:32109832
  16. 16. Chellappa SL, Aeschbach D. Sleep and anxiety: From mechanisms to interventions. Sleep Med Rev. 2022;61:101583. pmid:34979437
  17. 17. Aucoin M, LaChance L, Naidoo U, Remy D, Shekdar T, Sayar N, et al. Diet and Anxiety: A Scoping Review. Nutrients. 2021;13(12):4418. pmid:34959972
  18. 18. Chen H, Cao Z, Hou Y, Yang H, Wang X, Xu C. The associations of dietary patterns with depressive and anxiety symptoms: a prospective study. BMC Med. 2023;21(1):307. pmid:37580669
  19. 19. Stanczykiewicz B, Banik A, Knoll N, Keller J, Hohl DH, Rosińczuk J, et al. Sedentary behaviors and anxiety among children, adolescents and adults: a systematic review and meta-analysis. BMC Public Health. 2019;19(1):459. pmid:31039760
  20. 20. Allen MS, Walter EE, Swann C. Sedentary behaviour and risk of anxiety: A systematic review and meta-analysis. J Affect Disord. 2019;242:5–13. pmid:30170238
  21. 21. Arena AF, Mobbs S, Sanatkar S, Williams D, Collins D, Harris M, et al. Mental health and unemployment: A systematic review and meta-analysis of interventions to improve depression and anxiety outcomes. J Affect Disord. 2023;335:450–72. pmid:37201898
  22. 22. Virgolino A, Costa J, Santos O, Pereira ME, Antunes R, Ambrósio S, et al. Lost in transition: a systematic review of the association between unemployment and mental health. J Ment Health. 2022;31(3):432–44. pmid:34983292
  23. 23. Santini ZI, Jose PE, York Cornwell E, Koyanagi A, Nielsen L, Hinrichsen C, et al. Social disconnectedness, perceived isolation, and symptoms of depression and anxiety among older Americans (NSHAP): a longitudinal mediation analysis. Lancet Public Health. 2020;5(1):e62–70. pmid:31910981
  24. 24. Wilkialis L, Rodrigues NB, Cha DS, Siegel A, Majeed A, Lui LMW, et al. Social Isolation, Loneliness and Generalized Anxiety: Implications and Associations during the COVID-19 Quarantine. Brain Sci. 2021;11(12):1620. pmid:34942920
  25. 25. Altintaş E, Uylaş Aksu Z, Gümüş Demi̇r Z. Machine Learning Techniques for Anxiety Disorder. European Journal of Science and Technology. 2021.
  26. 26. Daza A, Saboya N, Necochea-Chamorro JI, Zavaleta Ramos K, Vásquez Valencia Y del R. Systematic review of machine learning techniques to predict anxiety and stress in college students. Informatics in Medicine Unlocked. 2023;43:101391.
  27. 27. Kotsilieris T, Pintelas E, Livieris IE, Pintelas P. Predicting anxiety disorders and suicide tendency using machine learning: a review. IJMEI. 2020;12(6):599.
  28. 28. Muhammad A, Ashjan B, Ghufran M, Taghreed S, Nada A, Nada A, et al. Classification of Anxiety Disorders using Machine Learning Methods: A Literature Review. Insights Biomed Res. 2020;4(1).
  29. 29. Pintelas EG, Kotsilieris T, Livieris IE, Pintelas P. A review of machine learning prediction methods for anxiety disorders. In: Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2018. 8–15. https://doi.org/10.1145/3218585.3218587
  30. 30. Anbarasi LJ, Jawahar M, Ravi V, Cherian SM, Shreenidhi S, Sharen H. Machine learning approach for anxiety and sleep disorders analysis during COVID-19 lockdown. Health Technol (Berl). 2022;12(4):825–38. pmid:35669293
  31. 31. Chavanne AV, Paillère Martinot ML, Penttilä J, Grimmer Y, Conrod P, Stringaris A, et al. Anxiety onset in adolescents: a machine-learning prediction. Mol Psychiatry. 2023;28(2):639–46. pmid:36481929
  32. 32. Collins S, Hoare E, Allender S, Olive L, Leech RM, Winpenny EM, et al. A longitudinal study of lifestyle behaviours in emerging adulthood and risk for symptoms of depression, anxiety, and stress. J Affect Disord. 2023;327:244–53. pmid:36754097
  33. 33. Priya A, Garg S, Tigga NP. Predicting Anxiety, Depression and Stress in Modern Life using Machine Learning Algorithms. Procedia Computer Science. 2020;167:1258–67.
  34. 34. Sau A, Bhakta I. Predicting anxiety and depression in elderly patients using machine learning technology. Healthcare Tech Letters. 2017;4(6):238–43.
  35. 35. Hueniken K, Somé NH, Abdelhack M, Taylor G, Elton Marshall T, Wickens CM, et al. Machine Learning-Based Predictive Modeling of Anxiety and Depressive Symptoms During 8 Months of the COVID-19 Global Pandemic: Repeated Cross-sectional Survey Study. JMIR Ment Health. 2021;8(11):e32876. pmid:34705663
  36. 36. Tabares Tabares M, Vélez Álvarez C, Bernal Salcedo J, Murillo Rendón S. Anxiety in young people: Analysis from a machine learning model. Acta Psychol (Amst). 2024;248:104410. pmid:39032273
  37. 37. Byeon H. Exploring Factors for Predicting Anxiety Disorders of the Elderly Living Alone in South Korea Using Interpretable Machine Learning: A Population-Based Study. Int J Environ Res Public Health. 2021;18(14):7625. pmid:34300076
  38. 38. Carpenter KLH, Sprechmann P, Calderbank R, Sapiro G, Egger HL. Quantifying Risk for Anxiety Disorders in Preschool Children: A Machine Learning Approach. PLoS One. 2016;11(11):e0165524. pmid:27880812
  39. 39. Farooq SA, Konda O, Kunwar A, Rajeev N. Anxiety Prediction and Analysis- A Machine Learning Based Approach. In: 2023 4th International Conference for Emerging Technology (INCET), 2023. 1–7. https://doi.org/10.1109/incet57972.2023.10170115
  40. 40. Husain W, Xin LK, Rashid NA, Jothi N. Predicting Generalized Anxiety Disorder among women using random forest approach. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 2016. 37–42. https://doi.org/10.1109/iccoins.2016.7783185
  41. 41. Li Y, Song Y, Sui J, Greiner R, Li X-M, Greenshaw AJ, et al. Prospective prediction of anxiety onset in the Canadian longitudinal study on aging (CLSA): A machine learning study. J Affect Disord. 2024;357:148–55. pmid:38670463
  42. 42. Nayan MdIH, Uddin MSG, Hossain MdI, Alam MdM, Zinnia MA, Haq I, et al. Comparison of the Performance of Machine Learning-based Algorithms for Predicting Depression and Anxiety among University Students in Bangladesh. Asian Journal of Social Health and Behavior. 2022;5(2):75–84.
  43. 43. Nemesure MD, Heinz MV, Huang R, Jacobson NC. Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Sci Rep. 2021;11(1):1980. pmid:33479383
  44. 44. Qasrawi R, Vicuna Polo SP, Abu Al-Halawa D, Hallaq S, Abdeen Z. Assessment and Prediction of Depression and Anxiety Risk Factors in Schoolchildren: Machine Learning Techniques Performance Analysis. JMIR Form Res. 2022;6(8):e32736. pmid:35665695
  45. 45. Talib AA, Binnouh A, Alqahtani H, Bassfar Z, Alhmiedat T, Alatawi A. Predicting anxiety among technical employees: A machine learning approach. Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University. 2023;44:953–65.
  46. 46. Wei Z, Wang X, Ren L, Liu C, Liu C, Cao M, et al. Using machine learning approach to predict depression and anxiety among patients with epilepsy in China: A cross-sectional study. J Affect Disord. 2023;336:1–8. pmid:37209912
  47. 47. Firth J, Solmi M, Wootton RE, Vancampfort D, Schuch FB, Hoare E, et al. A meta-review of “lifestyle psychiatry”: the role of exercise, smoking, diet and sleep in the prevention and treatment of mental disorders. World Psychiatry. 2020;19(3):360–80. pmid:32931092
  48. 48. Penninx BW, Pine DS, Holmes EA, Reif A. Anxiety disorders. The Lancet. 2021;397(10277):914–27.
  49. 49. Bandelow B, Michaelis S, Wedekind D. Treatment of anxiety disorders. Dialogues Clin Neurosci. 2017;19(2):93–107. pmid:28867934
  50. 50. Taylor J, Sukhoi O, Newson J, Thiagarajan T. Global Mind Project data in the United States: A comparison with national statistics. 2023. https://osf.io/p9ur6
  51. 51. Newson JJ, Thiagarajan TC. Assessment of Population Well-Being With the Mental Health Quotient (MHQ): Development and Usability Study. JMIR Ment Health. 2020;7(7):e17935. pmid:32706730
  52. 52. Newson JJ, Pastukh V, Thiagarajan TC. Assessment of Population Well-being With the Mental Health Quotient: Validation Study. JMIR Ment Health. 2022;9(4):e34105. pmid:35442210
  53. 53. Qu K, Xu J, Hou Q, Qu K, Sun Y. Feature selection using Information Gain and decision information in neighborhood decision system. Applied Soft Computing. 2023;136:110100.
  54. 54. Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 2017.
  55. 55. Lane MM, Gamage E, Travica N, Dissanayaka T, Ashtree DN, Gauci S, et al. Ultra-Processed Food Consumption and Mental Health: A Systematic Review and Meta-Analysis of Observational Studies. Nutrients. 2022;14(13):2568. pmid:35807749
  56. 56. Twenge JM, Martin GN. Gender differences in associations between digital media use and psychological well-being: Evidence from three large datasets. J Adolesc. 2020;79:91–102. pmid:31926450
  57. 57. Grandjean P, Landrigan PJ. Neurobehavioural effects of developmental toxicity. Lancet Neurol. 2014;13(3):330–8. pmid:24556010
  58. 58. James AA, OShaughnessy KL. Environmental chemical exposures and mental health outcomes in children: a narrative review of recent literature. Front Toxicol. 2023;5:1290119. pmid:38098750
  59. 59. Bala J, Newson JJ, Thiagarajan TC. Hierarchy of demographic and social determinants of mental health: analysis of cross-sectional survey data from the Global Mind Project. BMJ Open. 2024;14(3):e075095. pmid:38490653
  60. 60. Blanchflower DG, Bryson A. Unemployment and sleep: evidence from the United States and Europe. Econ Hum Biol. 2021;43:101042. pmid:34271429
  61. 61. Kajeepeta S, Gelaye B, Jackson CL, Williams MA. Adverse childhood experiences are associated with adult sleep disorders: a systematic review. Sleep Med. 2015;16(3):320–30. pmid:25777485
  62. 62. Dolezal BA, Neufeld EV, Boland DM, Martin JL, Cooper CB. Interrelationship between Sleep and Exercise: A Systematic Review. Adv Prev Med. 2017;2017:1364387. pmid:28458924
  63. 63. Goodwin GM. The overlap between anxiety, depression, and obsessive-compulsive disorder. Dialogues Clin Neurosci. 2015;17(3):249–60. pmid:26487806
  64. 64. Noyes R Jr. Comorbidity in generalized anxiety disorder. Psychiatr Clin North Am. 2001;24(1):41–55. pmid:11225508
  65. 65. Newson JJ, Sukhoi O, Thiagarajan TC. MHQ: constructing an aggregate metric of population mental wellbeing. Popul Health Metr. 2024;22(1):16. pmid:39020379