Table 1.
Demographic information for each survey year.
Fig 1.
Flow chart of the predictive modeling process.
Starting with the initial dataset, we show the process of how we trained and validated our models.
Fig 2.
Receiver operating characteristic curve for suicidal thoughts and behavior prediction.
We see that the LightGBM model was the highest performing model. Other tree-based models are omitted for clarity, as they performed worse than LightGBM. Higher scores (with curves closer to the upper left corner) are better.
Fig 3.
SHAP plots showing feature importance for predicting STB.
Note that scores are relative to this particular dataset, with larger scores indicating higher influence.
Fig 4.
Most predictive questions for adolescent suicidal thoughts and behavior (non-ranked).
A table of the most predictive questions, along with which domain and risk and predictive factor they involve, if applicable.
Fig 5.
A SHAP force plot of a single individual.
This method examines the factors that influenced the model for prediction on a single individual, showing questions that led the model to think they are more likely to have STB in red and questions that led the model to think they are less likely to have STB in blue. A decision boundary of more than zero indicates that the model predicts that they have STB. In this example, you can see that their answer to Q138 of 5 (frequent internet harassment), their answer to Q25C of 6 (early alcohol usage), and their answer to Q38H of 2 (violent activity) led the model to predict that they have STB. This allows for easy interpretability of the model’s results, making it more trustworthy and transparent.
Fig 6.
Accuracy scores using the Top-N questions in a cumulative fashion.
Results for average model performance of the LightGBM model using the N top questions, along the X-axis. As the number of questions given to the model increases, so does the accuracy.
Fig 7.
The top 10 most important questions for males vs females.
Note that compared to Fig 3 the majority of the questions are the same, however, there are a few slight differences described in the the main text, such as age (Q2), physical aggression (Q25I) and hating school (Q17B).
Fig 8.
The top 10 most important questions for middle and high school respondents.
Note that compared to Fig 3 the majority are the same, however, there are a few slight differences such as the addition of Q17B to both and the addition of Q126 to middle schoolers.
Fig 9.
The top 10 most important questions for the varying levels of suicidal thoughts and behavior.
Note that compared to Fig 3 the majority are the same. For the target question about “Attempted Suicide” we see the largest difference, with the focus shifting more to the importance of gender (Q1) and school performance (Q15 and Q19).