Explainable artificial intelligence models for predicting risk of suicide using health administrative data in Quebec

Suicide is a complex, multidimensional event, and a significant challenge for prevention globally. Artificial intelligence (AI) and machine learning (ML) have emerged to harness large-scale datasets to enhance risk detection. In order to trust and act upon the predictions made with ML, more intuitive user interfaces must be validated. Thus, Interpretable AI is one of the crucial directions which could allow policy and decision makers to make reasonable and data-driven decisions that can ultimately lead to better mental health services planning and suicide prevention. This research aimed to develop sex-specific ML models for predicting the population risk of suicide and to interpret the models. Data were from the Quebec Integrated Chronic Disease Surveillance System (QICDSS), covering up to 98% of the population in the province of Quebec and containing data for over 20,000 suicides between 2002 and 2019. We employed a case-control study design. Individuals were considered cases if they were aged 15+ and had died from suicide between January 1st, 2002, and December 31st, 2019 (n = 18339). Controls were a random sample of 1% of the Quebec population aged 15+ of each year, who were alive on December 31st of each year, from 2002 to 2019 (n = 1,307,370). We included 103 features, including individual, programmatic, systemic, and community factors, measured up to five years prior to the suicide events. We trained and then validated the sex-specific predictive risk model using supervised ML algorithms, including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Multilayer perceptron (MLP). We computed operating characteristics, including sensitivity, specificity, and Positive Predictive Value (PPV). We then generated receiver operating characteristic (ROC) curves to predict suicides and calibration measures. For interpretability, Shapley Additive Explanations (SHAP) was used with the global explanation to determine how much the input features contribute to the models’ output and the largest absolute coefficients. The best sensitivity was 0.38 with logistic regression for males and 0.47 with MLP for females; the XGBoost Classifier with 0.25 for males and 0.19 for females had the best precision (PPV). This study demonstrated the useful potential of explainable AI models as tools for decision-making and population-level suicide prevention actions. The ML models included individual, programmatic, systemic, and community levels variables available routinely to decision makers and planners in a public managed care system. Caution shall be exercised in the interpretation of variables associated in a predictive model since they are not causal, and other designs are required to establish the value of individual treatments. The next steps are to produce an intuitive user interface for decision makers, planners and other stakeholders like clinicians or representatives of families and people with live experience of suicidal behaviors or death by suicide. For example, how variations in the quality of local area primary care programs for depression or substance use disorders or increased in regional mental health and addiction budgets would lower suicide rates.


Introduction
Suicide is a complex public health issue and one of the leading causes of death worldwide.It remains difficult to predict suicidal behaviors accurately and consistently despite increased awareness of suicide as a significant cause of avoidable death [1].Every suicide is a disaster that affects families, neighborhoods, and entire countries and that has long-lasting effects on the people left behind [2].In Quebec, suicide takes the lives of more than a thousand individuals per year [3] for a population of 8.7 million inhabitants Every day, approximately 3 Quebecers tragically take their lives.However, suicide remains a rare event with 2017 agestandardized mortality rate by suicide of 11.4 per 100,000 overall and 17.2 and 5.8 per 100,000 in males and females, respectively [4].In Canada, it is approximated that for each death by suicide, there are 25-30 additional suicide attempts, many of which result in emergency department visits, immediate hospitalization, and/or mental health facility admissions in the past year [5][6][7].
Mental health conditions, such as depression, bipolar disorder, schizophrenia, and substance abuse disorders, have consistently been shown to be prevalent among those who die by suicide [8].Mental and substance use disorders are found in 95% of suicide cases, and the population attributable fraction for mental disorder were ranged from 47-74% [9].Systematic audits which consist of bottom-up evidence, indicated that deficits in primary care mental health or addiction services can be identified in over 80% of all suicides [10,11].From a populational perspective, suicide may be related to factors at the individual (e.g., mental disorders), programmatic (e.g., visits to Emergency Departments; adequacy of follow up for depression in the local area primary care), system (e.g., per capita regional budget for substance use disorders), and community level (e.g., neighborhood social deprivation) [12].
The growth of data and machine learning (ML) has the potential of impacting deeply all aspects of healthcare, among other domains.ML is a branch of artificial intelligence (AI) that relies on statistical and probabilistic techniques to automatically learn patterns and enhance performance on specific tasks through exposure to data [13].Nevertheless, caution must be taken in medical and public health applications as "black-box" models that automatically improve via experience can pose potential risks and, as such, hinder its adoption.Close supervision is required to ensure safety and accuracy.However, such techniques are relevant as they offer powerful models that can capture interactions between potentially correlated risk factors and suicide, which can be difficult to achieve at adequate levels with conventional statistical methods [14].Recent research has revealed a range of advantages of ML that can assist in detecting, diagnosing, and treating mental health problems, and predicting the suicidal behaviours level risk in populations [14][15][16][17][18].For example, Gradus et al. [17] created sex-specific ML models utilizing data from eight Danish national health and social registries, encompassing over 90% of the Danish populace.They found that sex-specific suicide risk in Denmark was significantly associated with factors linked to substance use disorders (SUDs), such as alcoholrelated disorders and prior poisoning.About 20% of men and women who died by suicide had SUDs.However, whether risk factors for suicide differ between the high-risk SUD population and the general population is an area that remains mostly unexplored.Kessler et al. [18].developed ML models to predict suicide risk among USA army soldiers after hospital stays and in the Veterans Health Administration system.Even though this study has been helpful, the results from US army members and veterans may not be generalizable to the larger community of patients admitted to psychiatric hospitals.Furthermore, it is unclear whether men and women have different suicide risk profiles after being admitted to mental hospitals.Walsh et al. [19] constructed a predictive model for adolescents using electronic health record data and the model performed well in a limited Southern US population.Nevertheless, a more comprehensive risk predictive models that considers a broader range of clinical factors beyond mental health comorbidities and medications and generates quantifiable risk scores to determine patients' suicide risk levels is necessary.
Despite the advancements in AI, a significant impediment to adopting AI systems in healthcare is that many are seen as "black-boxes" by stakeholders and decision-makers referring to their lack of interpretability [20].When we refer to an algorithm as a "black box," we mean that the estimated function that relates inputs to outputs is not understandable at an ordinary human level; for instance, the function that relies on many parameters, complicated parameter combinations, or nonlinear parameter transformations.Explainable Artificial Intelligence (XAI) proposes shifting toward more transparent AI to address this issue.XAI typically refers to post hoc analyses and techniques used to understand previously trained "black-box models" or their predictions [21][22][23].It is significant because it is necessary to understand the causality of learned representations for decision support and helps assess whether the model is considering the right features while making a specific prediction [24][25][26].XAI methods can provide various explanations, such as global, local, contrastive, what-if, counterfactual, and examplebased [26,27].Each approach to interpretability can be used for real-world problems based on the characteristics of the environment where we would like to use a specific approach [28].As part of XAI, global explanations techniques provide top-down mental representation of the AI model's behavior, typically in the form of visual charts, mathematical formulae, or model graphs [27].Among XAI methods, SHapley Additive explanations (SHAP) has emerged as a prominent choice for several reasons.First, it provides a unified interpretability framework that can be applied to diverse ML models without requiring model-specific modifications.This versatility makes the SHAP applicable across various domains.Second, SHAP offers local and global interpretability, allowing insights into individual feature contributions and model behavior.Its solid theoretical foundation in game theory ensures fairness and consistency in attributing the feature importance.Lastly, SHAP facilitates clear and intuitive visualizations, making explanations easily understandable and facilitating stakeholder collaboration [29].Considering the constraints imposed by the confidentiality of our dataset, we chose to employ SHAP as a global explanation technique for our project because it has proven to be effective in revealing the behavior of ML models.
Studies have consistently shown that suicide affects more men than women [30][31][32].Research has found a correlation between traditional masculine traits, such as independence, assertiveness, leadership, and dominance, and an increased risk of suicidal thoughts in middle-aged men [33][34][35][36][37][38][39] However, while men are more likely to die by suicide, women are more likely to engage in suicidal behaviors and deliberate self-harm.This is known as the "gender paradox" of suicide, which has been consistently shown across various studies [40][41][42].Moreover, retrospective cohorts of the Australian and white American populations have revealed that suicide rates progressively increase throughout life for males, whereas females have their most elevated rates between 35-54 and 45-54 years, respectively [43,44].Didier et al. [45] have shown that men face a higher risk of suicide than women in most nations, as evidenced by their study on the gender paradox in suicidal behavior and its impact on the suicidal process.Considering the differences in the prevalence, risk factors, and protective factors for suicide between males and females, this study aims to develop sex-specific suicide risk prediction supervised ML models and apply a post-hoc global explanation approach to interpret the findings.

Ethical approval
This project was approved by the Ethics committee of both the Dalhousie University and of Universite ´Laval.Access to the QICDSS was approved by government bodies, the Public Health Ethics Committee, and the Commission d'accès à l'information du Que ´bec for chronic disease surveillance purposes (Blais C, Jean S, Sirois C, Rochette L, Plante C, Larocque I, Doucet M, Ruel G, Simard M, Gamache P (2014) Quebec integrated chronic disease surveillance system (QICDSS), an innovative approach.Chronic Dis Inj Can 34(4):226-235) Informed consent was not required in the context of register-based studies that use anonymized data.This study was performed in line with the principles of the 1964 Helsinki Declaration.

Data sources
The data source for this study, the Quebec Integrated Chronic Disease Surveillance System (QICDSS), was accessed in 1-11-2020, covering up to 98% of the province of Quebec's population and containing data for over 20,000 suicides between 2002 and 2019.The QICDSS consists of five linked databases, namely the health insurance registry (demographic information), the physician billing database (all medical fee-for-service acts billed to the Re ´gie de l'assurance maladie du Que ´bec, including diagnoses), the hospitalization database (including primary and secondary diagnoses based on the International Classification of Diseases, 9th revision (ICD-9) until April 2006 and ICD-10 after that), the prescription claim database for those covered by the public drug plan and the vital statistics death database.A detailed description of the QICDSS is available elsewhere [46].Individual occurrences of suicide within the general population are comparatively infrequent when contrasted with other causes of death [47].we used a case-control study design to unravel the nuanced factors associated with suicide.
The training dataset included individuals aged 15 or older who died from suicide between January 1st, 2002, and December 31st, 2010 (n = 9,440) as cases, and a random sample of 1% of the Quebec population aged 15 or older who were alive on December 31st of each year, from 2002 to 2010, to track account for potential population changes over time (n = 661,780) as controls.Similarly, for the test dataset, individuals aged 15 or older who died from suicide between January 1st, 2011, and December 31st, 2019 (n = 8,899) were considered cases.Controls were selected as a random sample of 1% of the Quebec population aged 15 or older who were alive on December 31st of each year from 2011 to 2019 (n = 645,590).In order to maximize the variability of predictors, cases and controls were not matched.

Features
We included 103 individual, programmatic, systemic, and community features measured starting five years (60 months) before the suicide events-see "S1 Table : List of variables".Some features (e.g., diagnoses, utilization service) were dummy coded to create time intervals ranging from 0-6 to 0-60 months before the first day of the suicide month.Missing data is common in real-world big data sources.To handle missing features in our dataset, we employed several imputation techniques based on the type of variable.For continuous (numeric) variables, mean and median imputation methods were utilized, while mode imputation was used for categorical (nominal) variables.Additionally, for categorical variables with multiple categories, we used one-hot encoding [48] to convert them into binary form.

ML models
We used Python for the model development and analysis.Model development and evaluation were accomplished in three phases.
Phase 1: Model building.We developed sex-specific supervised ML models.The term "sex-specific" refers to our approach of training the model on the complete dataset but conducting separate evaluations based on sex (male and female) during the testing phase.It is important to note that we did not create separate models for each sex but rather utilized the same model to assess performance within each group.The values that were produced by the models during training were compared to target values, providing feedback to the learner.The train-test split method was used [49] to estimate the performance of our ML models.For the final model, we used data from 2002 to 2010 with 9440 suicide cases (7234 males and 2206 females) and 661,780 controls for a train set, and from 2011 to 2019 with 8899 cases (6713 males and 2186 females) and 645,590 controls for a test set.The test set was kept aside for the model's final evaluation, while the train set was used to train the models.We can quickly and effectively test our trained models by using data they have never seen before.
We trained supervised ML classifications with Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP) with an optimized model architecture.LR assumes a linear relationship between the logit of the outcome and predictor variables and has been extensively utilized in ML research on suicide predictions [18,19].RF is a tree-based algorithm that ensembles of many decision trees based on bootstrapped samples and aggregates votes (predicted class) from each tree.It enhances the classification tree by considering a random subspace of predictors when building a tree and by creating a diverse set of trees that contribute to classification performance [50].The XGBoost method is a variant of the gradient boosting algorithm, which minimizes errors by applying the gradient descent method in a Boosting algorithm combining several weak learners [51].MLP is a neural network classifier consisting of feedforward networks with dense, all-to-all connections between layers.The network uses nonlinear transformations to learn high-level abstractions in the data to build the predictive model [52,53].The optimal hyperparameter for models was found through a five-fold cross-validated grid-see "S2 Table : Hyperparameters".Then, a model was built using the learned parameters for the whole training set, and predictions were made on the testing set, which was never utilized for model selection or parameter tuning.Schematic representation of the development of the prediction model has shown in (Fig 1).
Handling data imbalance.The dataset was imbalanced as cases of suicide were rare relative to controls.Classification would be biased towards the positive class if the imbalance is not handled.This can result in a falsely perceived positive effect on the model's accuracy.To address this problem, we used the Synthetic Minority Over-sampling Technique (SMOTE) [54].
SMOTE not only rectifies the class distribution imbalance but also introduces diversity and variability into the minority class, which can be advantageous for model training.By oversampling the minority class, SMOTE avoids discarding potentially valuable information.The application of SMOTE was performed solely on the training set to enhance class balance, while keeping the test set intact to ensure a realistic test distribution is preserved.
Phase 2: Model evaluation.The discriminatory performances were assessed using the Receiver Operating Characteristics (ROC), the Area Under Curve (AUC), and various operating characteristics, including sensitivity, specificity, and positive predictive value (PPV) with an adjusted threshold.Threshold adjustment was a meticulous process in which we extensively evaluated the trade-offs between sensitivity, specificity, and PPV.We conducted a comprehensive evaluation process to determine the optimal adjusted threshold, including techniques such as ROC analysis and precision-recall curves.We aimed to find a perfect equilibrium that maximized the identification of near-suicide cases while minimizing false positives.The model calibration was assessed by using Temperature Scaling (TS).TS is a post-processing technique that uses a single scalar to smooth the softmax output and regularize the entropy [55,56] with the following equation: where ŷ is the prediction and z is the logit vector by a learned scaler parameter T.
The calibrated probabilities by TS can approximately represent the confidence score of model predictions, which means the alignment between the observed accuracy distribution and the predicted probabilities.These confidence scores should match the true correctness likelihood [56].The reliability diagram is a simple way to visualize calibration accuracy as a function of confidence [57].The confidence score and accuracy should match when a model is calibrated.For instance, if the model predicts 0.8 for 100 examples in the test set, the accuracy over those 100 examples should be very close to 80%.If this is the case, we say that the model is well calibrated.
Phase 3: Explainability.To examine the contribution of each feature to the predictive model output, we used the SHapley Additive explanations (SHAP) [58].SHAP is based on coalitional game theory and can be used to explain the output of any ML models.SHAP values are calculated as a measure of the impact of each feature by comparing the prediction results with and without that feature [58,59].The Shapley values are computed by taking the average of the marginal contribution of each feature value across all possible values in the feature space [58].The terms Shapley values and SHAP values are frequently used interchangeably.Technically speaking, this is incorrect as Shapley values represent the theory, and SHAP values are a specific implementation for calculating Shapley values.
SHAP summary plot.The summary plot is demonstrated to combine feature importance with feature effects.The position on the x-axis is determined by Shapley value, and the y-axis by the feature.The colors of the dots indicate if the value of the corresponding feature is high (usually in red) or low (usually in blue) [58].Given a coalitional game (N, v), the Shapley value of player (feature) i is expressed as: Where ; i (N, v) is denoted the Shapley value ; of i in a subset (or coalition) N and a value v.For example, how much feature (i.e., age) contributed to the prediction risk of suicide.S�N \{i} means that the elements of S will be included in N but not i.If N = {age, sex, location}, S could be {sex, location} with i = age.If we compare S with and without "age", we will find a different marginal contribution value v.
The weighing factor |S|!(|N|−|S|−1)!counts the number of ways the subset S can be permuted, [v(S[{i})−v(S)] calculates the difference of the value v of a subset S of N not containing i with the same subset i. S s�N{i} sum over all possible sets S; 1  N! finally, sum over all possible sets S and obtain an average by dividing by N!, the number of players participating in the game, i.e., the number of features that we have in total.
XAI and feature contributions were presented using the coefficients and Shapley values, this paper computes Shapley value using the SHAP package in Python.

Description of the sample
The suicide cases included 7,234 (76.63%) men and 2,206 (23.36%) women.People who died by suicide were slightly older than the non-suicide group for (mean [SD] of 45.3 [16.0] vs 44.3 [17.6] years) and similar for women (46.2 [15.3] vs 46.4 [18.9] years).Regarding material and social deprivation, the suicide cases lived in more deprived areas.Tables 1 and 2 provide an overview of the demographic characteristics in the study population for the training and test sets, respectively.

Model performance
Table 3 and Fig  The men's model showed that the accuracy rate was the highest for XGBoost (0.97), followed by RF (0.96), MLP (0.96) and LR (0.95).The four models showed a sensitivity of 0.31-0.38,meaning that the models correctly identified approximately 31-38% of men who died by suicide.Other metrics showed a specificity of 0.97-0.98 and a PPV of 0.20-0.25.The cross- validated AUC was the highest with RF, and LR (0.79), and the lowest with XGBoost and MLP (0.76).Overall, LR appeared to provide more accurate classification results than the other algorithms when predicting suicide in men.The women's model showed an accuracy for XGBoost (0.98), RF (0.98), MLP (0.97) and LR (0.98), and a sensitivity of 0.40-0.47across the four classification models, meaning that the models correctly identified approximately 40-47% of women who died by suicide.Other metrics showed a specificity of 0.97-0.99 and PPV of 0.11-0.19.The cross-validated AUCs was the highest with RF (0.87) and lowest with MLP (0.83).Overall, the women's model showed better performance compared to the men's model.

Feature importance
Absolute coefficients.Fig 3A and 3B illustrate the 20 features with the biggest absolute coefficients using LR prioritized differently based on the sex-specific models.Psychotherapy visits with a psychiatrist in the past 60 months were highly associated with the risk of suicide, followed by mood and anxiety disorders in the past sixty months, substance use disorders in the past 60 months, age, and non-intentional trauma in the past 60 months.
XAI with SHAP.The SHAP feature importance values were used to identify the impact of each feature on the prediction of suicide risk.The top 20 features of the model are listed on the y-axis in Fig 4A and 4B and are ranked from most important to least important for men and women, respectively.The variables at the top considerably influenced the prediction  outcomes, whereas the variables at the bottom had less of an effect on the result.From Fig 4A and 4B, it could be inferred that the top 5 features for predicting the risk of suicide for both (men and women) were age, specialist outpatient visits for physical disorders in the past sixty months, regional mental health budget, regional dependence budget, and psychotherapy visits with a psychiatrist in the past 60 months.Examples of

Discussion
In this study, we demonstrated the feasibility of predicting the risk of suicide with health administrative data using ML techniques.The main objective of the present study was to examine the supervised ML approach for the sex-specific suicide risk prediction model.Our ML classification achieved 0.76-0.79AUC in men models, and accuracy and 0.83-0.87AUC in women models.The weights in the LR model were used to identify the parameters and attributes that are globally or locally significant for accurate prediction.The important features found with absolute coefficients were consistent with previous research on their associations with suicide risk at the individual, programmatic and community level [60][61][62][63][64]. Considering the 10 most significant associations of in Fig 3, a first set of individual variables pertain to mental disorders and substance use disorders are well established risk factor for suicide [63][64][65][66][67].A second set of variables are contact with primary care physicians and specialist mental health care, and Emergency Departments (ED) visits, which are indicative of greater need for care, but not always receiving adequate care.Systematic audits have shown that half of suicide cases were in contact in the last year with the ED, 50% in contact with a general practitioner and 25% with a psychiatrist, and deficits have consistently been found in the 3 areas, i.e., coordination of specialist mental health and specialist addiction services at the ED; poor access to referral by general practitioners to specialist mental health or addiction consultation; poor access to mobile crisis resolution teams in support of outpatient services [10].The increased risk of suicide among patients presenting for non-intentional trauma at the ED may be due to alcohol abuse.This indicates that more systematic suicide risk assessment would be warranted for this group.An increased suicide risk was also found among outpatients receiving more psychotherapy, likely reflecting the severity of their mental health condition.A third set of variables (age; rural areas) can be described as individual and community levels variables which are significantly associated with suicide [60][61][62].Finally, the strong link with regional addiction or mental health budget echoed in the very recent Chief Coroner public enquiry on suicide based on a series of individual cases audit and their aggregation, suggested an increase in the public managed care health and social services ministry budget for mental health and addiction services [68].
Another objective of this study was to provide better insights into ML models, given that most ML models are usually sophisticated and black-boxes are difficult to comprehend.For instance, the non-linear model, such as RF, XGBoost, and neural networks, perform well in accurately classifying the data and making predictions but they are difficult to understand regarding how they make their decision from the raw models and as such to gain trust on their predictions.XAI aims at providing some explanations on the elements leading to decision over such black-box models [69].In the current study, SHAP was used for achieving explainability by determining the contribution of each variable in the model.SHAP values offer different interpretations, such as global interpretability, which highlighted the significance of every indicator and local interpretability, which determines SHAP values specifically for each instance.This significantly increases the transparency and helps explain case prediction and major decision contributors [58,70].For SHAP values, various visualizations can be employed to help model interpretation and provide explanations for the results, such as Force plots for single predictions, the summary plot of all SHAP values, and dependence plots [71].In this study, we used the summary plot to offer a broader view of the importance of each feature that had a positive or negative impact on the suicide.
In the SHAP summary plot, the top variables, including age, specialist outpatient visits for physical disorders in the past 60 months, regional mental budget, regional dependence budget, psychiatric outpatient psychotherapy in the past 60 months, and quality of local area primary care for depression, provided the insight into the local area programs where the individual lives [12,72].The association with other specialist physical health visits may seem to be counterintuitive, but it is related to the deficits in identifying suicide risk and consultation with physical mental health or addiction services, as evidenced by at least one case [10].Similarly, while capturing the quality of psychiatric outpatient visits with potentially effective and relevant treatments like psychotherapy, the modeling indicated that the low number of psychotherapy visits with a psychiatrist in the past 60 months was associated with lower risk, but not the local area quality of primary care physician practices for depression where the individual was living associated with lower risk.
The former variable does not allow disentanglement of severity with the quality of the intervention in the association, whereas the latter is more indicative of the local care environment where the individual may have accessed timely care.The adequate balance of local primary care and specialist mental health care has been evidenced in Finland by Pirkola et al., 2009 [73].In addition to continuous suicide risk evaluation of outpatient psychotherapy patients, the identified variable may indicate a need for more publicly funded private practice psychiatrists in the areas where there is a relative shortage of mental health professionals in emergency departments, hospitals, and community care [74].Another important system-level variable is the regional dependence budget [75], which funds addiction programs in Quebec at approximately $150 million annually, with $1.5 billion allocated for specialist mental health care [76].
Substance use disorders explain over 25% of the populational attributable risk of suicide [77].Registry-based studies indicated a 3 to 4-fold increase in mortality of people diagnosed with substance use disorders, including cancer, cardiovascular diseases, suicide and accidental deaths [78].Tondo et al.2006 [79] demonstrated that association between mental health expenditures and suicide rates in the United States.However, there is a larger health economics literature linking lower per capita regional health expenditures with increased mortality rates [80,81].
As demonstrated in our article, insights from SHAP plots may support clinical, managerial and people with lived experience intuition for risk classification, aid in creating more precise staging systems, and help identify risk inflection points for applications like suicide prediction.Finally, other newly developed explainability techniques may be a better option depending on the application [82][83][84].
Finally, it should be stressed that although the predictive models are working at the individual level, with distinct predictions on suicide risk made for each person, the approach is intended to study the effect of various factors of risk at the population and sub-population levels.The motivations for such an approach are two-fold.First, using such a predictive model for interventions at the individual level brings up various ethical and practical considerations, knowing that the level of errors at the individual predictions can be significant.Indeed, making proper use of individual suicide risk prediction in an operational setting would require better adjustments and safeguards to minimize possible harm, which is out of the scope of the current work.Second, we advocate that using individual predictions for evaluating the impact of some measures at the populational level provides better capabilities to evaluate suicide risk for some subgroups of the population (e.g., specific demographic or healthcare service subareas) and simulating the impact of some measures by altering some attributes of the individual (e.g., investment in mental healthcare, socio-economic deprivation).

Limitation
There are several limitations of this study.First, the whole study was retrospectively conducted using INSPQ health administrative data for the prediction of suicide risk.The reliability of diagnoses and the fact that they are not mandatory in physician billing are the most important issues related to Quebec's register [46].It is first distributed similarly with the control group; secondly, many chronic diseases have been reasonably validated [46] and recent work validating with expected lifetime prevalence of schizophrenia in the Canadian Chronic Disease Surveillance System (circa 1% for Quebec and other populous provinces Canadian Chronic Disease Surveillance System (CCDSS) (canada.ca);and with ADHD long term outcomes similar with 1 or 2 more claims vs no claim [85].Therefore, we know relatively little about suicide behaviors strongly associated with suicide deaths among the individuals in the control group.Second, this study was limited by the lack of information that may be relevant to suicide risk prediction, such as ethnicity, income, and education.Future studies should prioritize the collection and inclusion to overcome the limitation of information on relevant factors such as ethnicity, income, and education.Third, this study only covered suicides between 2002 and 2019, which could limit the applicability and generalizability of the study's findings to the present time.Fourth, the study's limited predictive value unanswered the question of why there is a difference in predictive value between both sexes.The final limitation of this study was the exclusion of medication data in the prediction risk of suicide since about 44% of patients had private insurance, and we could not access their medication use with the QICDSS.Future studies should explore the strategies for examining if medication data would enhance the model performance, as over 50% of the population are on the Quebec public drug plan.

Conclusion
This study demonstrated the useful potential of explainable AI models as tools for decisionmaking and population-level suicide prevention actions.The ML models included individual, programmatic, systemic, and community levels variables available routinely to decision makers and planners in a public managed care system.Caution shall be exercised in the interpretation of variables associated in a predictive model since they are not causal, and other designs are required to establish the value of individual treatments.The next steps are to produce an intuitive user interface for decision makers, planners and other stakeholders like clinicians or representatives of families and people with live experience of suicidal behaviors or death by suicide.For example, how variations in the quality of local area primary care programs for depression or substance use disorders or increased in regional mental health and addiction budgets would lower suicide rates.

Fig 1 .
Fig 1.Schematic representation of the development of the prediction model.SMOTE: Synthetic Minority Over-Sampling Technique.https://doi.org/10.1371/journal.pone.0301117.g001 2 present the classification performance of the four models for predicting the risk of suicide.Assessed graphically with a reliability diagram before and after calibration is shown in "S1 Fig: Calibration plots ".

Fig 2 .
Fig 2. Area Under the Receiver Characteristics Curve (AUROC).Discriminatory performance of models A: the area under the receiver characteristics curve (AUROC) of men; B: the area under the receiver characteristics curve (AUROC) of women.https://doi.org/10.1371/journal.pone.0301117.g002 Fig 4A are discussed below to give the readers a glimpse of the interpretation.One example we can pick from Fig 4A is that male patients with non-intentional trauma would have an increased risk of suicide as denoted with red color, which corresponds to the positive values in the x-axis.Another case from Fig 4B is for woman with a high positive value (red color) for social and material deprivation, which denoted an increased suicide risk (value corresponding to the x-axis).In other words, social and material deprivation was present in many cases of woman suicide.

Fig 3 .Fig 4 .
Fig 3. Feature importance of the logistic regression model.The 20 most important features in the LR model with sex-specific A: men model and B: women model.https://doi.org/10.1371/journal.pone.0301117.g003