Correlates of wanting to seek help for mental health and substance use concerns by sexual and gender minority young adults during the COVID-19 pandemic: A machine learning analysis

The COVID-19 pandemic has worsened the mental health and substance use challenges among many people who are Two Spirit, lesbian, gay, bisexual, transgender, queer, questioning, and intersex (2SLGBTQI+). We aimed to identify the important correlates and their effects on the predicted likelihood of wanting to seek help among 2SLGBTQI+ young adults for mental health or substance use concerns during the pandemic. A cross-sectional survey was conducted in 2020–2021 among 2SLGBTQI+ young adults aged 16–29 living in two Canadian provinces (Ontario and Quebec). Among 1414 participants, 77% (n = 1089) wanted to seek help for their mental health or substance use concerns during the pandemic, out of these, 69.8% (n = 760) reported delay in accessing care. We built a random forest (RF) model to predict the status of wanting to seek help, which achieved moderately high performance with an area under the receiver operating characteristic curve (AUC) of 0.85. The top 10 correlates of wanting to seek help were worsening mental health, age, stigma and discrimination, and adverse childhood experiences. The interactions of adequate housing with certain sexual orientations, gender identities and mental health challenges were found to increase the likelihood of wanting to seek help. We built another RF model for predicting risk of delay in accessing care among participants who wanted to seek help (n = 1089). The model identified a similar set of top 10 correlates of delay in accessing care but lacked adequate performance (AUC 0.61). These findings can direct future research and targeted prevention measures to reduce health disparities for 2SLGBTQI+ young adults.


Introduction
Two Spirit, lesbian, gay, bisexual, transgender, queer, questioning, and intersex (2SLGBTQI+) young adults experience significant mental health disparities and substance use challenges [1][2][3][4][5]. In 2019, 46.8%% of lesbian, gay or bisexual (LGB) youth in the United States (US) reported having serious thoughts of attempting suicide compared to only 14.5% of their heterosexual peers. Similarly, the rate of illicit drug use was reported to be more than twice among the LGB youth (27.8% among LGB vs 12.7% among heterosexual) [4]. Recent meta-analysis found that 2SLGBTQI+ youth had 2.67 times higher odds of experiencing mental health difficulties than heterosexual and cisgender counterparts [3]. Moreover, lifetime use of substances was reported 1.48 times more likely among transgender individuals compared to the cisgender people [5].
Growing evidence has demonstrated that that the recent pandemic outbreak of Coronavirus disease 2019 (COVID-19) exacerbated the mental health and substance use problems among sexual and gender minority (SGM) people [6][7][8]. A Canadian prospective study showed that SGM participants had experienced higher and increasing rate of deterioration of mental health, suicidal thoughts, alcohol, cannabis and substance use compared to non-SGM people [6]. Moreover, multiple reports of difficulty in accessing healthcare by 2SLGBTQI+ people is concerning and needs urgent attention [7,9]. However, it is not clear which individuallevel factors may have contributed to the desire to seek help for mental health and substance use concerns among 2SLGBTQI+ young adults during the pandemic. These results have crucial implications to guide policy interventions that target this specific population. We aimed to identify these factors by using machine learning analysis-a novel statistical method which has emerged as a promising means of analyzing a vast range of complex data in public health informatics [10,11]. Our recent scoping review of machine learning applications in mental health and substance use issues among SGM people found only 4 prediction modelling studies up to 2020 [12]. We recently updated the review and detected 3 additional studies [13][14][15] that applied machine learning-based predictive modelling and none of them explored the wanting to seek help behaviours for mental health and substance use issues among this population. Hence, we addressed this research gap by this current study.
Machine learning has also been increasingly used as an important tool in the field of intersectional research of inequalities targeting socio-demographic identities and social positions (e.g., age, sex, gender, ethnicity, socio-economic status) [16,17] through providing newer techniques and methods [18,19]. Recent data suggests that 84% and 66% of Canadian homeless 2SLGBTQI+ young adults experienced severe anxiety and moderately severe depression respectively, while 57% of these young adults reported problematic substance use during the pandemic [20]. Moreover, unemployment and urbanicity have been found to increase the risk of suicidal thoughts among SGM youth who experienced mental health challenges or social stigma [13]. It is important to understand how these interactions of SGM status with different socio-economic identities influence needs of seeking help. In addition, as 2SLGBTQI+ people encounter stigma, bullying, violence and institutional discrimination throughout their lifespan [21][22][23], further contributing to the development of mental health challenges and substance use concerns among them [3,23,24]. Hence, understanding the role of these risk factors behind wanting to seek help and accessing care could help policy makers to plan equitable distribution of healthcare.
The primary objective of our study was to identify the important correlates and their effects on the predicted risk of wanting to seek help for mental health and substance use concerns among 2SLGBTQI+ young adults during the pandemic. We also aimed to assess the prevalence and correlates of delay in accessing care among 2SLGBTQI+ young adults who wanted to seek help from health professionals as a secondary objective.

Data source
Data for this cross-sectional analysis were taken from the 2SLGBTQI+ Tobacco Project Survey, conducted among 2SLGBTQI+ young adults aged 16-29 years living in Canada during December 2020 to March 2021. This study was approved by the Centre for Addiction and Mental Health Research Ethics Board and adhered to the principles of the Declaration of Helsinki. Individuals provided written informed consent before participating in this study. Participants were recruited from two Canadian provinces: Ontario and Quebec. We received 1511 responses and participants were compensated $10 CAD. Participants who did not complete the survey were excluded, resulting in a total of 1414 respondents.
All statistical analyses were performed using R, version 4.0.5 (R Foundation for Statistical Computing). The survey questionnaire and the R codes required to reproduce the analysis are available on the Open Science Framework (https://osf.io/tnd2u/) and the deidentified data can be accessed from the University of Toronto Dataverse [25]. The entire analytical framework of the machine learning models from data pre-processing to sensitivity analysis were presented in S1 Appendix.

Outcome measures
We assessed two binary outcomes in this analysis, including a primary outcome defined for the entire study sample and a secondary outcome defined only for a subgroup of participants. Our primary outcome was the status of wanting to seek help from health professionals for mental health or substance use concerns during the pandemic, which was denoted as March 2020 [26,27]. This variable was defined using the survey question, 'Since March 2020, was there a time when you wanted to talk with or seek help from a health professional about stress, depression, problems with emotions or substance use?' Next, amongst participants who answered 'yes' to this question, i.e., those who had ever wanted to seek help during the pandemic (n = 1089, 77.0% of all participants), we created another binary outcome variable of 'delay in accessing care' based on the question 'Did you delay or not get the care you thought you needed?'

Features
For both outcomes, we considered the same set of 61 features, including 53 categorical and 8 continuous variables (see https://osf.io/tnd2u/ for full list of variables). From the survey, we extracted socio-demographic characteristics (age, sexual orientation, gender, ethnicity, education, employment, income, duration of living in Canada, and housing status); health status (self-rated general and mental health, comorbidity, disability, stress, diagnosis of different mental illness, past year suicidal thoughts and past-week depression score); frequency of smoking, vaping, other tobacco products use, alcohol drinking, cannabis and other drug use; and perceived COVID-19 impact on smoking and other substance use. Additionally, we created several composite score variables based on validated scales measures, which included overall scores of enacted and perceived stigma [28], internalized homophobia [29], identity centrality [30], community connectedness [31], average scores of outness [32] and overall scores of adverse childhood experiences (ACEs) [33]. used 2-sample t-test and for categorical variables, we used Fisher's Exact Test to compare the characteristics in both descriptive analyses. Due to the presence of small cell counts (n<5), we used Fisher's Exact Test instead of Pearson Chi-square test to robustly identify any significant difference between groups [34].
We performed standard data preprocessing procedures to create a dataset ready for the machine learning analysis. We removed or recoded low-variance variables, defined as those with percentage of unique values <5% [35,36]. Inspection was carried out for all variables to rule out collinearity (i.e., all pairwise Cramér's V<0.6) [37]. For missing data found in our dataset (2.8% total missingness with 15 variables showing �5% missing data), we handled them using the Multivariate Imputation by Chained Equations (MICE) approach [38] after confirming the assumption of missing at random was plausible. Specifically, 5 MICE-imputed datasets were independently generated, where based on the performance (highest AUC) of the models, one dataset was chosen to be used in the primary analysis, with the remainder being assessed in the sensitivity analysis. Before running the final model, we created the composite score variables in the imputed data. Notably, out of 1414 participants, 11% (n = 161) declined to respond to the ACEs questionnaire, which were managed by MICE imputation before creating overall scores of ACEs. No feature selection was performed due to the relatively lower number of total variables (61 variables). We additionally used 'missForest', a random forestbased iterative algorithm [39] to examine the impact of missing data in the sensitivity analysis.

Random forest analysis
We performed separate random forest (RF) analysis for the primary outcome (on the entire study sample, n = 1,414) and the secondary outcome (on those who had wanted to seek help, n = 1,089), following the same analytical pipeline (S1 Appendix). We specifically chose RF due to its proven effectiveness in public health applications, particularly in predicting smoking and vaping outcomes, and provides higher level of accuracy compared to others [40][41][42]. In each analysis, we randomly split the imputed dataset at 7:3 ratio to obtain a larger training set for model development and a smaller testing set for evaluating model performance. Using the R package 'caret' [43] and 10-fold cross-validation method [44], we built the RF model on the training dataset using the default setting for hyperparameters (S1 Appendix). Performance of the RF model was evaluated on the testing set using the receiver operating characteristic (ROC) curve. The area under the ROC curve (AUC) was computed to give an overall account of the classification ability of the model while high performance was declared using an AUC threshold of 0.80 [45]. Accuracy, sensitivity, and specificity of the RF models were also calculated. If the final model failed to reach a testing AUC of 0.70 [46], we implemented regularized logistic regression using the Lasso (least absolute shrinkage and selection operator) [47] to see if this would yield a better performance (see sensitivity analysis).

Interpreting the random forest findings
For each outcome, we used the respective RF model to conduct post-hoc analyses to provide interpretable policy-relevant findings. First, we identified the top ten most important contributing features (i.e., correlates) based on a relative importance score calculated for each variable. These scores corresponded to the loss of prediction accuracy of the RF model due to the exclusion of a variable [48,49]. We ranked all the variables on a scale of 0% to 100% with the most important one receiving the highest score of 100%. For our primary outcome (wanting to seek help), we additionally illustrated the one-way partial dependence plots (PDPs) for each of its top ten correlates to demonstrate their marginal effect. We further assessed the partial dependence based 2-way interaction effects [50] by identifying pairs of variables that were jointly significant in predicting the probability of wanting to seek help. This was done by examining the statistical strength of all 91 pairs of interactions formed by sociodemographic variables (such as age � gender) and/or the previously identified top 10 correlates, following the procedures conducted in a recent machine learning paper [51]. We were particularly interested in studying interactions between sociodemographic variables to demonstrate the effect of intersectionality [52]. Two-way PDPs were then used to demonstrate the top 10 strong interactions and their marginal effects.

Sensitivity analysis
Sensitivity analyses were conducted to examine the robustness of the primary findings. First, we repeated the analyses separately on the remaining 4 MICE imputed datasets. Second, we also ran completed-case analyses by only including participants with full data (n = 364, 25.7% of all participants for the primary outcome; n = 293, 26.9% of all participants for the secondary outcome), although these models were expected to have poor performance. Third, to further assess the impact of missing data, we implemented the 'missForest' algorithm and compared the resulted RF models with our primary models. We used 'missForest' because it was found to outperform other data imputation algorithm including MICE and K-nearest neighbour imputation [39]. Fourth, we derived a logistic regression model using lasso if the primary RF model failed to reach an AUC of 0.70. The lasso method achieves a sparse model by penalizing the number of predictors and only retaining one of the two correlated predictors [47].
Wanting to seek help was found to differ significantly by age, sexual orientation and gender identity, education, employment, and household income (p<0.01). Notably, those who wanted to seek help were slightly younger on average (mean age 21.75 ± 3.78 years). More than 80.0% of the participants with bisexual (n = 339), queer (n = 214) and others (n = 172) sexual orientation wanted to seek help. Approximately 85.0% of the transgender men (n = 79) and gender diverse (n = 344) participants as well as almost 80.0% of the cisgender women (n = 479) wanted to seek help. The rate of wanting to seek help was 80.6% (n = 741) among participants who had less than a university degree, 82.1% (n = 792) among unemployed, 83.1% (n = 458) among those who had a past year household income of less than $60,000. In addition, wanting to seek help was also significantly (p<0.01) higher among past year substance users (82%, n = 356), those with anxiety disorder (90.3%, n = 561) and depression (91.6%, n = 489), higher stress levels (86.5%, n = 753), poor or fair mental health (88.2%, n = 851) and general health

Performance of the RF model for wanting to seek help
The RF model built for predicting the status of wanting to seek help for mental health or substance use issues showed moderately high performance, with accuracy 0.86 (95% CI 0.83, 0.89), sensitivity 0.53 (95% CI 0.42, 0.63), specificity 0.95 (95% CI 0.92, 0.97) and AUC 0.85 (Fig 1).

Top 10 correlates of wanting to seek help
The top 10 correlates of wanting to seek help identified from the RF model were (Fig 2): current self-rated mental health status (100%), average outness score (59.3%), overall enacted stigma score (51.2%), past-year suicidal thoughts (49.8%), overall internalized homophobia score (45.3%), overall connectedness score (44.2%), overall identity centrality score (43.9%), overall perceived stigma score (34.2%), age (25.8%), and overall ACEs score (25.1%). One-way partial dependence plots (PDPs) depicting the marginal effects of the top 10 correlates are presented in Fig 3. Overall, the likelihood of wanting to seek help was higher among those who rated their mental health as poor or fair, reported to have suicidal thoughts in the past year, higher enacted stigma, higher internalized homophobia, lower connectedness with the community, lower level of identity centrality, and those who rated their outness and perceived stigma much higher or much lower than average. The probability of wanting to seek help was highest among the youngest age groups, and also higher among older age groups. Participants with very high ACEs score showed the highest probability of wanting to seek help, while the probability was also quite high for those with very low ACEs score.

Interaction effects on predicted risk of wanting to seek help
All pairwise interactions formed by socio-demographic variables and the top 10 correlates of wanting to seek help were presented in the S1 Table. The strongest interaction was found between current self-rated mental health and housing status (interaction strength 0.2). Other top 10 interactions involved intersections of housing status with sexual orientation, gender identities, past-year suicidal thoughts, average outness score and overall internalized homophobia score; interactions were found between sexual orientation and gender identities with self-rated mental health and past-year suicidal thoughts. Specifically, after stratified by mental health status, past-year suicidal thoughts, outness score and internalized homophobia score, individuals with adequate housing during the pandemic always had a higher probability of wanting to seek help than those experiencing inadequate housing (Fig 4). In addition, among young adults who self-identified themselves as bisexual, others, queer and questioning, gender diverse or cisgender, the risk of wanting to seek help was further increased by having adequate housing, poor or fair self-rated mental health or past-year suicidal thoughts (Figs 5 and 6).

Descriptive statistics and predicted risk of delay in accessing care
Of the total 1089 participants who wanted to seek help during the pandemic, 69.8% (n = 760) reported delay in accessing care and 25.8% (n = 281) had no delay in accessing care (Table 2). Participants who had delay in accessing care were slightly younger on average (mean age 21.63 ± 3.73 years, p = 0.007) and differed significantly (p = 0.002) by highest level of education completed. Particularly among those who had less than a university degree, 72.1% (n = 534) reported delay in accessing care. The RF model built for predicting risk of delay in accessing care failed to achieve satisfactory performance (AUC 0.61) on the testing data (S2 Appendix).  The top 10 correlates were: average outness score (100%), overall internalized homophobia score (85.7%), overall ACEs score (81.8%), age (77.4%), overall connectedness score (77.3%), overall perceived stigma score (76.2%), overall enacted stigma score (73.4%), overall identity centrality score (71.2%), diagnosis of other mental illness (27.1%), current self-rated general health (26.4%) (S1 Fig). However, due to the unsatisfactory performance of this RF model, we deemed those findings to be less reliable.

Sensitivity analysis
All alternative RF models built on the remaining 4 MICE-imputed copies of the datasets and with 'missForest' for wanting to seek help and delay in accessing care performed similarly with AUC ranging 0.82-0.85 and 0.56-0.61 respectively (S2 and S3 Appendix). For both outcomes, the model performance dropped considerably while the models were built on the completed cases datasets (AUC 0.75, AUC 0.52 respectively). The Lasso logistic regression model found 15 of the 61 predictors to have a non-zero coefficient when predicting the status of delay in accessing care, including 4 (average outness score, overall internalized homophobia score, selfrated general health, diagnosis of other mental illness) that were previously identified to be among the top 10 correlates by RF (S2 Appendix). However, this lasso regression model failed to show better performance on the testing set (AUC 0.57). Hence, we conclude that there is a lack of key predictors of delay in accessing care in this specific dataset.

Discussion
We applied machine learning to analyze factors associated with wanting to seek help for mental health and substance use concerns among 2SLGBTQI+ young adults during the pandemic.
Using the resulted RF model that had moderately high performance, we identified correlates  of wanting to seek help that depict vulnerable young adults in need of urgent interventions. Furthermore, we used the same RF technique to identify those who had delay in accessing care; however, this model demonstrated poor performance which impeded our ability to draw any conclusive findings on correlates and interactions.
The findings of our study support an emerging body of evidence [7,20,53] that a significant proportion of 2SLGBTQI+ young adults showed wanting to seek help during the pandemic, while most of them reported delay in accessing care (Tables 1 and 2). Moreover, we found that the rate of wanting to seek help was higher among those who decreased or quit their smoking, or increased drinking or substance use due to COVID-19 pandemic. These findings suggests that 2SLGBTQI+ young adults probably wanted services for quitting smoking or managing consequences of their substance use. The rate of wanting to seek help was significantly higher among unemployed young adults. However, it is important to recognise that most of the participants (80.6%, n = 741) in our sample did not finish a university degree and may be still studying at the undergraduate levels, which might be reflected by the higher rate of unemployment in the overall sample ( Table 1).
The top 10 correlates of wanting to seek help among 2SLGBTQI+ young adults identified in our study were mainly related to mental health challenges, age, sexual and gender identity related stigma and discrimination, and ACEs (Figs 2 and 3). Consistent with previous research, we found an association between wanting to seek help and poor or fair self-rated mental health status [54,55]. Previous research reported decreased rate of seeking help among those who seriously considered or attempted suicide [54,56]. In our study, we found increased likelihood of wanting to seek help among participants who had suicidal thoughts in the past year. However, this particular finding of increased needs of seeking help among 2SLGBTQI+ young adults experiencing mental health challenges presents an opportunity for timely intervention. In addition to increased prevalence of mental health issues [6], making virtual mental health visits became more common during the pandemic [57]. Hence, 2SLGBTQI+ young adults can be encouraged to leverage these services, seek help and access urgent online supports at the time of mental health emergency. The association of exposure to enacted stigma or discrimination, perceived stigma, homophobia, low community connectedness, low level of identity centrality with adverse mental health outcomes have been already demonstrated [22,58,59]. Our results shows that these experiences also increased risk of wanting to seek help. Previous research among 2SLGBTQI + college students concluded that high level of outness was associated with high likelihood of affected by enacted stigma and high probability of help-seeking [60]. The same effect was demonstrated in our study for wanting to seek help. While more research are needed to explore ways to reduce stigma and discrimination against 2SLGBTQI+ people, tailored peer support programs can improve self-advocacy, community connectedness, and improve mental health as well as quality of life [61]. In addition, policy initiatives like anti-discrimination measures and LGBTQI2S+ inclusion in the socio-economic development [62] will help to reduce discrimination. Evidence suggests that high exposure to ACEs predicts poor mental health outcomes among SGM youth and adults [63,64]. Moreover, high ACEs score were found to be associated with ever-smoking and ever-vaping among the 2SLGBTQI+ young adults [61]. We found similar effect of increased likelihood of wanting to seek help among participants with high ACEs score. However, the association between wanting to seek help and low ACEs score more likely resulted from the data imputation of high percentage (11% of the total sample) of missing data for the ACEs questionnaire. While this relationship between ACEs and wanting to seek help should be further explored exclusively, the ACEs questionnaire can be used for screening mental health and substance use challenges among 2SLGBTQI+ young adults.
Our findings suggest that individuals who had adequate housing, but poor mental health or past-year suicidal thoughts were more likely to want to seek help than those with inadequate housing. However, a recent study reported that homeless 2SLGBTQI+ youth were experiencing poor mental health, suicidal thoughts, self-harm, increased substance use and other mental health issues as an impact of COVID-19 [20]. This difference may be attributable to numerous barriers (i.e., lack of affordability, lack of transportation, stigmatization) faced by homeless individuals which may discouraged them to seek healthcare [65]. Moreover, past experiences of stigmatization and discrimination from health care providers for their sexual and gender identities may further inhibit young people to seek help [9]. Consistent with this possibility, we found that participants with adequate housing and lower level of internalized homophobia or average outness had higher probability of wanting to seek help (Fig 4). Notably, in our sample, bisexual, queer and questioning, cisgender, and gender diverse participants, who experienced mental health challenges or had adequate housing, were more likely to want to seek help than others (Figs 5 and 6). This finding contradicts previous research, where trans people were similar to cisgender people or more likely to experience mental health challenges and seek help than cisgender participants [55,66]. While future research should explore the reason behind these differences, these findings should be taken into account for the purpose of targeted prevention and intervention to reduce health disparities faced by 2SLGBTQI+ young adults. Both the RF model and the Lasso regression model built for predicting delay in accessing care had poor performance (S2 Appendix), largely attributed to our limited sample size and capture of key predictors. Notably, delay in accessing care result from a complex interplay of factors beyond the individual level (structures and processes of care), as outlined in the Donabedian model [67]. Nevertheless, a few individual-level variables (outness, internalized homophobia, self-rated general health, and other mental illness) were identified as potentially important correlates in both models, which may imply some meaningful mechanisms that underlie high risks of care delays. This aligns with research on real fears of violence, and social exclusion experienced by SGM people as well as lack of 2SLGBTQI+ friendly healthcare providers during disclosure [68]. Specifically, recent studies reported that SGM people faced significant rates of health care discrimination and harassments by healthcare providers, including refusal to provide care as well as inflicting verbal and physical abuse during the COVID-19 pandemic [7,9]. These experiences of discrimination within the healthcare system should be incorporated into future research to evaluate their impact on predicting the risk of delayed access to care. Future study with more extensive capture of variables is needed to identifying predictors of delay in accessing care by 2SLGBTQI+ young adults to enable policy planning that ameliorates health care discrimination and ensures health equity.
We demonstrated that machine learning can be used successfully to identify potential risk factors of mental health and substance use related issues among the 2SLGBTQI+ young adults, with an adequate sample size to capture of key predictors. Compared to traditional regression approaches, machine learning allowed us to investigate potentially complex relationships amongst 61 variables without imposing significant assumptions to identify statistically meaningful individual predictors as well as interactions [69,70]. This flexible data-driven technique is particularly useful in hypothesis-generating settings to support causality-testing investigations. As such, this study provides an example of deriving interpretable findings from a blackbox machine learning model, which has been identified as being underperformed in the current public health literature [70,71]. Future public health researchers should be mindful of making their machine learning models interpretable and potentially use more sophisticated techniques (such as individual conditional expectation plots [72]) to enhance the real-world relevance of their interpretations. 2SLGBTQI+ people face various social and institutional barriers in accessing timely healthcare services, which are often intertwined with factors that challenge the disclosure of their identities [71]. By leveraging the data-driven strength of machine learning, future researchers could adopt our approach to uncover nuanced findings that hinder the help-seeking behaviours of this underserved population [11]. In addition, beyond supervised machine learning that we demonstrated in this study, techniques of unsupervised learning (cluster analysis [73]) can be applied to characterize unique preferences/trajectories of care that might be particularly suitable for this population to enable a patient-centred care approach.
The findings of this study should be interpreted with few key limitations. Although the RF model predicting status of wanting to seek help yielded relatively high performance, we were unable to achieve satisfactory performance for the models build for predicting delay in accessing care. This weakness of the current study was attributed to the inherent limitations of a cross-sectional survey-based study (e.g., small sample, missing responses, omission of potentially important attributes in the questionnaire). Future researchers should use more comprehensive datasets, preferably ones that link individual-and system-level factors from health services administrative databases with population-based survey data, to depict a fulsome picture of barriers to timely care receipt. Next, our primary outcome denoted the status of wanting to seek help, without confirming how many of them have actually subsequently sought (and received) professional help. This should be addressed in longitudinal studies. Furthermore, we were unable to comment on whether the participants started to experience the needs to seek help since the start of the pandemic or how the pandemic influenced changes in risk factors of wanting to seek help among 2SLGBTQI+ young adults. These unaddressed objectives can be explored in a case-control setting with data from the pre-pandemic era. Moreover, owing to the data driven nature of the machine learning techniques and cross-sectional data, the top 10 correlates identified were not 'true predictors', rather factors that might contribute to a larger increase in the likelihood of wanting to seek help than other variables. As such, the correlates suggested by this study should be confirmed using causality testing techniques to formally establish their statistical importance.

Conclusions
We conducted an explanatory machine learning analysis on Canadian 2SLGBTQI+ young adults during the COVID-19 pandemic to identify individual-level correlates of wanting to seek professional help and subsequent delay in accessing care. We identified several sexual and gender-identity related stressors influencing wanting to seek help, which should be taken into consideration by the policy makers for targeted prevention approaches and ensuring health equity. Interventions such as peer support programs, social inclusion of 2SLGBTQI+ people on socio-economic development, transportation and combined efforts for people experiencing homelessness, and 2SLGBTQI+ inclusive and affirming health care services should be considered for reducing mental health and substance use challenges as well as improving help-seeking among this population. The poor performance of the random forest models aimed to predicting delay in accessing care warrants additional research using more comprehensive datasets that include factors measured at structures and processes of care levels.