Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An analysis of Chinese nursing electronic medical records to predict violence in psychiatric inpatients using text mining and machine learning techniques

  • Ya-Han Hu,

    Roles Formal analysis, Methodology

    Affiliations Department of Information Management, National Central University, Taoyuan City, Taiwan, Asian Institute for Impact Measurement and Management, National Central University, Taoyuan City, Taiwan

  • Jeng-Hsiu Hung,

    Roles Writing – review & editing

    Affiliations Department of Obstetrics and Gynecology, Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Taipei, Taiwan, School of Medicine, Tzu Chi University, Hualien, Taiwan

  • Li-Yu Hu,

    Roles Conceptualization, Investigation, Supervision, Validation

    Affiliations School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan

  • Sheng-Yun Huang,

    Roles Writing – review & editing

    Affiliation Department of Psychiatry, Chiayi Branch, Taichung Veterans General Hospital, Chiayi, Taiwan

  • Cheng-Che Shen

    Roles Data curation, Funding acquisition, Investigation, Project administration, Writing – original draft

    pures1000@yahoo.com.tw

    Affiliations School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, Department of Psychiatry, Chiayi Branch, Taichung Veterans General Hospital, Chiayi, Taiwan, Center for Innovative Research on Aging Society (CIRAS), National Chung Cheng University, Minxiong, Taiwan, Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan

Abstract

Background

The prevalence of violence in acute psychiatric wards is a critical concern. According to a meta-analysis investigating violence in psychiatric inpatient units, researchers estimated that approximately 17% of inpatients commit one or more acts of violence during their stay. Inpatient violence negatively affects health-care providers and patients and may contribute to high staff turnover. Therefore, predicting which psychiatric inpatients will commit violence is of considerable clinical significance.

Objective

The present study aimed to estimate the violence rate for psychiatric inpatients and establish a predictive model for violence in psychiatric inpatients.

Methods

We collected the structured and unstructured data from Chinese nursing electronic medical records (EMRs) for the violence prediction. The data was obtained from the psychiatry department of a regional hospital in southern Taiwan, covering the period between January 2008 and December 2018. Several text mining and machine learning techniques were employed to analyze the data.

Results

The results demonstrated that the rate of violence in psychiatric inpatients is 19.7%. The patients with violence in psychiatric wards were generally younger, had a more violent history, and were more likely to be unmarried. Furthermore, our study supported the feasibility of predicting aggressive incidents in psychiatric wards by using nursing EMRs and the proposed method can be incorporated into routine clinical practice to enable early prediction of inpatient violence.

Conclusions

Our findings may provide clinicians with a new basis for judgment of the risk of violence in psychiatric wards.

Background

People with mental illness are at greater risk of violence, although most do not act violently [13]. According to the National Institute of Mental Health’s Epidemiologic Catchment Area survey, patients with serious mental illness, including major depressive disorder, schizophrenia, or bipolar disorder, are 2 to 3 times more likely than people without such illnesses to be assaultive [3]. In addition, violence is a common problem that is a frequent cause of injuries to clinicians in psychiatric inpatient units [4, 5]. In a meta-analysis investigating violence in psychiatric inpatient units, the researchers determined that 17% of inpatients committed one or more act of violence during their stay [5]. Inpatient aggression can negatively affect health-care providers, patients, and the therapeutic environment due to the influence of aggression and the measures implemented to prevent aggression generally being counter-therapeutic [57].

A multitude of studies have been conducted on aggression/violence in psychiatric inpatient units [4, 5, 830]. Most have focused on individual patient risk factors for aggression/violence. The factors considered to be most associated with patient aggression/violence are the existence of previous episodes, the victim and aggressor being of the same sex, being hostile and impulsive, having experienced involuntary admission, and having longer hospitalization [9, 21, 22]. Furthermore, some studies have revealed situational, relational, and environmental factors to also be related to aggression/violence in inpatient settings [1012, 22]. A summary of recent studies on the prevalence and risk factors of violence among psychiatric inpatients is presented in Table 1. When the results of studies that recruited different populations were compared in a meta-analysis, however, only a few of the aforementioned factors were determined to be effective predictors of aggression [13]. Predicting violent incidents can be highly challenging [31]. For example, a study demonstrated that inaccuracy in violence risk assessment is often experience-related [32], and Eaton et al indicated that violence may be impossible to predict in patients with mental disorders [33].

thumbnail
Table 1. Comparative analysis of prevalence and risk factors of violence in psychiatric inpatients: A review of recent studies (2019–2022).

https://doi.org/10.1371/journal.pone.0286347.t001

Studies have developed a variety of risk assessment measures to improve violence risk assessment [3443]. However, their performances in different locations vary considerably [44]. In addition, the process of applying some risk assessment measures is cumbersome and time-consuming, and frequently assessing the risk of violence is thus impractical in most real-world clinical settings [45, 46]. Because of these challenges, developing a means through which violent incidents can be predicted by analyzing already-registered clinical text would be a valuable contribution to the field of personalized medicine and would yield time savings.

Most medical institutions use electronic medical record (EMR) systems, and many studies have retrieved unstructured text data from EMRs to investigate various research topics [20, 4752]. According to the results of our literature review, only two studies used EMRs to predict the risk of aggression in psychiatric inpatients [20, 26], and no studies have used Chinese nursing EMRs to analyze related issues. Therefore, this study established a predictive model for violence in psychiatric inpatients by using structured and unstructured data obtained from Chinese nursing EMRs and several machine learning techniques.

Methods

Data sources and study sample

We obtained the data set for the violence prediction task from the psychiatry department of a regional hospital in southern Taiwan. This psychiatry department has 2 25-bed acute psychiatric wards. Admissions from the 2 wards from between January 2008, and December 2018, were included in the data set. The data used in this study were personal data, diagnosis data, and nursing records that were included in EMRs.

Ethics statement

The current study received approval from the Institutional Review Board (IRB) of Taichung Veterans General Hospital (IRB number: SE19143B). The data set comprised deidentified secondary data. Therefore, the IRB of Taichung Veterans General Hospital formally waived the requirement for participant consent.

Research variables

The dependent variable was the presence or absence of violence during hospitalization. In this study, we defined violence as any behavior that involves an overt attempt or an actual act of aggression that causes physical harm to another person. This includes behaviors such as threatening to hit or physically attacking someone, damaging property, or throwing objects at people. An example of presence violence in one of nursing EMRs is given below.

After toileting, attempted to climb onto another patient’s bed. Despite being reminded of their correct bed location, the patient persisted and even attempted to kick the other patient lying on the bed. Verbal intervention was used to try to stop the patient, but the patient then physically attacked the security staff.”

Two experienced psychiatrists independently reviewed the nursing EMRs of each inpatient to determine whether violence occurred during the patient’s hospitalization. In the case of a discrepancy, a third experienced psychiatrist reviewed the EMRs.

We obtained each patient’s personal basic information, admission diagnosis, and admission nursing assessment from the EMRs created by nurses on the first day of hospitalization. After the data were preprocessed, the structured features were combined with the text features extracted by text preprocessing techniques to obtain a complete set of independent variables.

Text preprocessing of nursing records

Two text preprocessing methods were used to extract text features: bag-of-words (BOW) and sentence embedding.

The BOW method is a commonly used technique in natural language processing (NLP) for text classification and analysis. It involves breaking down a piece of text into individual words, discarding grammar and word order, and representing the text as a numerical vector. Our bag-of-words text preprocessing steps are illustrated in Fig 1. These steps include word transformation, part-of-speech (POS) tagging, and text vectorization.

Initially, we converted full-width characters into half-width characters and removed non-American Standard Code for Information Interchange characters in nursing records.

Next, we used the CkipTagger and NLTK toolkits for text segmentation and POS tagging. Because nursing notes in Taiwan are written bilingually (i.e., in Chinese and English), the text features of each language were separately determined by using different natural language processing (NLP) tools. CkipTagger is an open-source Chinese NLP tool that was released by the Chinese Thesaurus Group (CKIP Lab) of Academia Sinica in 2019. Its main functions include named entity recognition, word segmentation, and POS tagging. NLTK is one of the most widely used open-source NLP tools for English document preprocessing. Many common text preprocessing tasks can be performed using NLTK, such as tokenization, stemming, POS tagging, and named entity recognition. We first performed word segmentation and POS tagging by using CkipTagger to retrieve a set of Chinese index terms. In this step, we only retained nouns as feature words. NLTK was used for word segments that were marked as foreign words by CkipTagger to identify English index terms.

After the text preprocessing steps were completed, we obtained a large set of feature words from the nursing notes; we employed dimension reduction to filter out meaningless words. The filter was set to a frequency of 0.1 to 0.9 occurrences per document. The term frequency-inverse document frequency (TF-IDF) technique of the Scikit-Learn suite was used for document vectorization.

Sentence embedding is an emerging NLP technique that involves representing a sentence or document as a dense vector of numerical values. The goal of sentence embedding is to capture the meaning and context of a sentence in a low-dimensional space, so that the resulting vector can be used as document features. This study used Sentence-BERT (SBERT) [53], a modification of the Bidirectional Encoder Representations from Transformers (BERT) model, to generate text embedding features. The key innovation of SBERT is the addition of a siamese network architecture, where two copies of the BERT model share the same weights and are trained to encode two different sentences. During training, the model is presented with pairs of sentences and learns to distinguish between sentences that are semantically similar and those that are dissimilar. This enables SBERT to generate high-quality sentence embeddings that capture the semantic meaning of a sentence.

The steps to generate document embeddings using SBERT are as follows: first, split the document into sentences; second, generate sentence embeddings; third, aggregate the sentence embeddings into a single vector by taking the mean average of the sentence embeddings; and finally, normalize the resulting document embedding.

Descriptive statistics

We performed independent t and chi-squared tests to identify differences between the violent and nonviolent groups.

Prediction model assessment

We employed 5 well-known supervised learning techniques, namely decision tree (DT, J48 module in WEKA) [54], random forest (RF; RandomForest module in WEKA) [55], support vector machine (SVM, SMO module in WEKA) [56], artificial neural network (ANN, MultilayerPerceptron module in WEKA), K-Nearest Neighbors (KNN, IBk module in WEKA) [57], and boosted random forest (AdaBoost+RF, AdaBoostM1 with RandomForest modules in WEKA) [57], to assess the prediction model performance.

Feature selection is a well-established technique in supervised learning. Feature selection can be employed to improve training efficiency and develop compact models with a high prediction performance. We used the CfsSubsetEval module with the BestFirst search method in Weka, that is, a correlation-based feature selection method, to identify correlations between independent and dependent variables. A feature subset containing features that were highly correlated with the dependent variable but not correlated with each other was obtained.

To improve the prediction performance of the investigated algorithms, the parameter settings can play a crucial role. Therefore, we utilized the CVParameterSelection metalearner module in Weka to optimize these settings. The module allowed us to define multiple combinations of parameters and automatically execute the base classifier with each combination. It then determined the optimal parameter settings based on the best prediction results obtained through cross-validation.

To mitigate overfitting while building the model, this study employed 10-fold cross-validation process. Firstly, the dataset was randomly and independently divided into 10 subsets. During each iteration, one of these subsets was used as a test dataset while the remaining nine subsets were used as training datasets. This allowed for a more robust evaluation of the model’s performance. Moreover, to prevent class imbalance problems, we employed an undersampling method during the cross-validation process. The SpreadSubsample module in Weka 3.8.3 was used to adjust the instance distribution of 2 classes in our training set. The confusion matrix was used to evaluate the prediction performance of each model (Fig 2). True positive (TP) represents the number of inpatients who are correctly predicted to be violent; True negative (TN) represents the number of inpatients who are correctly predicted to be non-violent; False positive (FP) represents the number of inpatients who are incorrectly predicted to be violent; False negative (FN) represents the number of inpatients who are incorrectly predicted to be non-violent.

Using the information summarized in the confusion matrix, 5 classification performance metrics, including accuracy, precision, recall/sensitivity, f1-measure, and specificity, can be obtained through the following equations [58, 59]: (1) (2) (3) (4) (5)

In addition, we also utilized the area under the curve (AUC) to assess the quality of the models. The AUC value can range from 0 to 1, with a higher score indicating a more favorable performance of the classifier.

Analytical tools

For data extraction, computation, linkage, and processing, we used Microsoft SQL Server 2005 (Microsoft, Redmond, WA, USA). We used SAS (Version 9.2; SAS Institute Cary, NC, USA) and SPSS (Version 19.0 for Windows; IBM, New York, NY, USA) for all statistical analyses, and P < .05 was considered to indicate significance. All supervised learning techniques used to assess the prediction model were implemented using Weka 3.8.2 open-source machine learning software (www.cs.waikato.ac.nz/ml/weka).

Results

Baseline data

We obtained the EMRs for 2357 inpatients; 293 were excluded because they contained too little text. Finally, 2064 admission records were included. The 5 most common admission diagnoses were schizophrenic disorders [750 cases, 36.3%, International Classification of Disease, ninth revision, Clinical Modification (ICD-9-CM): 295)], episodic mood disorders (527 cases, 25.5%, ICD-9-CM: 296), persistent mental disorders resulting from conditions classified elsewhere (125 cases, 6.1%, ICD-9-CM: 294), other nonorganic psychoses (119 cases, 5.8%, ICD-9-CM: 298), and dementia (84 cases, 4.1%, ICD-9-CM: 290). In addition, the records of 406 admitted patients (19.7%) stated that violence occurred during their hospitalization. The clinical and demographic variables of the violent and nonviolent groups are listed in Table 2. The violent group was younger and had a more violent history than the nonviolent group. Furthermore, patients in the violent group were more likely to be unmarried than those in the nonviolent group were.

thumbnail
Table 2. Characteristics of patients in violent and nonviolent groups.

https://doi.org/10.1371/journal.pone.0286347.t002

After data preprocessing, 3 types of variables were obtained, namely structured variables, TF-IDF document vector variables, and SBERT document embedding variables. For the complete data set, 1032 variables (i.e., 403 structured variables, 245 TF-IDF document vector variables, and 384 SBERT document embedding variables) were obtained. To evaluate the impact of text features on the experimental results, two feature sets were utilized. These sets consisted of structured variables combined with TF-IDF variables (a total of 648 variables) and structured variables combined with SBERT variables (a total of 787 variables). The application of the CfsSubsetEval method resulted in the retention of only 26 independent variables in the dataset, comprising of 21 structured variables, 4 TF-IDF document vector variables, and 1 SBERT document embedding variable.

Prediction model performance

The results on the performance of the prediction models on the training and validation data sets are presented in Table 3 and Fig 3. The performance of the AdaBoost+RF model was consistently better than that of the other techniques. The structured variables combined with TF-IDF variables resulted in an accuracy of 0.617, a precision of 0.619, a recall/sensitivity of 0.608, an f1-measure of 0.614, a specificity of 0.626, and an AUC of 0.634. The ANN model had the highest sensitivity (0.692) and f1-measure and the KNN model had the highest specificity (0.650). The structured variables combined with SBERT variables resulted in an accuracy of 0.555, a precision of 0.557, a recall/sensitivity of 0.544, an f1-measure of 0.550, a specificity of 0.567, and an AUC of 0.587. Compared to the use of bag-of-word features, the utilization of sentence embedding variables resulted in poorer performance. In addition, employing feature selection (CfsSubsetEval) resulted in an improved prediction performance for the reduced dataset. When this dataset was fed to AdaBoost+RF, the accuracy improved to 0.639, the precision to 0.648, the f1-measure to 0.629, the specificity to 0.667 and the AUC to 0.684.

thumbnail
Fig 3. Prediction model performance assessment using 10-fold cross-validation.

https://doi.org/10.1371/journal.pone.0286347.g003

thumbnail
Table 3. Prediction model performance assessment using 10-fold cross-validation.

https://doi.org/10.1371/journal.pone.0286347.t003

Discussion

Our study is the first to analyze Chinese-language EMRs to predict the risk of violence in psychiatric inpatients by using structured and unstructured data from Chinese EMRs and machine learning techniques. The following are the main findings of our study: (1) the violence rate for psychiatric inpatients was 19.7%; (2) patients in the psychiatric wards with violence were younger, had a more violent history, and were more likely to be unmarried than patients without violence were; (3) violence in psychiatric inpatients could be predicted by using data from Chinese nursing EMRs with an acceptable accuracy (AUC: 0.684).

Patients directing violence at staff or other patients is a common occurrence in most psychiatric treatment facilities. Iozzino et al [5] investigated the prevalence of violent incidents occurring during admission in 35 facilities and determined it to be 2% to 44%, with an average of 17% across the included sites. In our study, the data of 406 patients (19.7%) were determined to indicate that violence occurred during hospitalization. Therefore, the prevalence in the present study was similar with that of Iozzion et al. However, findings regarding the prevalence of violence during admission vary considerably in the literature. This may have occurred for several reasons. First, the definitions of aggression and violence used in studies on psychiatric inpatients can vary considerably, which may contribute to discrepancies in findings regarding the prevalence of violent incidents. For example, in Menger et al, violence was defined as either physical or verbal aggression toward hospital staff or other patients [20]. However, in Lam et al, only physical aggression was considered [16]. In addition, in Schlup et al, psychiatric inpatient violence was categorized and assessed as the following five types: (1) verbal violence; (2) verbal sexual violence; (3) violence against property; (4) physical sexual violence; and (5) physical violence [24]. As Mierlo et al. said, a uniform overall accepted definition of aggressive or violent behavior is lacking, which can result in different operationalizations [60]. It is important to note that aggression and violence are two distinct concepts. Aggression generally refers to behavior intended to cause harm, whether physical or psychological, while violence specifically involves physical harm, such as hitting, punching, or using a weapon. Violence can be seen as an extreme form of aggression with the primary goal of intentional injury. Therefore, the use of different definitions of violence and aggression in studies can lead to confusion in the interpretation of results. Second, the prevalence of inpatient violence may differ with the type of psychiatric ward (e.g., acute, chronic, forensic, or psychiatric intensive care unit). Third, ethnicity may influence the prevalence of inpatient violence. In their meta-analysis, Dack et al [13] reported no significant ethnicity-related differences between aggressive and nonaggressive patients. However, this result was statistically heterogeneous. Future studies should investigate the relationship between ethnicity and inpatient violence. Our study revealed the rate of violence in Asian acute psychiatric inpatients to be 19.7%.

Consistent with those of other studies [9, 1319], our findings indicated that a younger age [14, 15, 29], history of violence [16, 17, 23], and unmarried status [18, 19] are related to patients demonstrating violence in psychiatric wards. In a review by Cornaggia et al [9], the researchers concluded that being admitted involuntarily, the existence of previous aggressive episodes, having a longer hospitalization stay, being impulsive, misusing drugs or alcohol, having a younger age, and having a diagnosis of psychosis were associated with inpatient aggression/violence. Furthermore, a review by Dack et al [13] discovered inpatient aggression to be associated with being male, having a younger age, not being married, being admitted involuntarily, having more previous admissions, having a history of exhibiting self-destructive behavior, having a history of substance misuse, and having a history of violence.

Research indicates that predicting violence incidents can be challenging and is often experience-related [31, 32]. Therefore, several risk assessment instruments have been developed [61]. The most commonly used risk assessment instruments are the Violence Risk Appraisal Guide [39], Structured Assessment of Aggression Risk in Youth [41], and Historical Clinical Risk Management-20 [40]. Sing et al [62] determined in a meta-study that the aforementioned instruments have median AUCs of between 0.70 and 0.74. However, the effects identified using these instruments have generally been small. In addition, the studies that have employed these instruments have mostly had heterogeneous patient populations and differing reports regarding the performance of the instruments. These factors prevent the instruments’ predictive abilities from being generalized to other facilities [13, 44]. In addition, the application of some risk assessment instruments is time-consuming, which has rendered their frequent use in most real-world clinical settings impractical [45, 46]. Because of the aforementioned challenges, using clinical text that is already registered in EMRs to predict violent incidents may be a practical method for violence risk assessment. In our study, violence in psychiatric inpatients could be predicted using data in Chinese nursing EMRs with an AUC of 0.684.

The results of the 10-fold cross-validation revealed AdaBoost+RF to have the highest average AUC. RF has been reported to have a more favorable performance than many other conventional supervised learning techniques in numerous studies [6366]. RF offers the advantages of not assuming a linear relationship in the model; employing ensemble learning, in which a strong learner is formed by combining weak learner groups; and iteratively sampling data and completing embedded feature selection to create several decision trees. On the other hand, boosting (i.e., AdaBoostM1 in WEKA) is a technique used to improve the performance of weak learners by weighting misclassified observations and re-training the model to focus more on these observations in the subsequent iterations. This allows the model to gradually improve its performance by focusing on the data points that are more difficult to classify. Additionally, boosting can also help RF to capture more diverse and informative features by allowing each tree to focus on different subsets of the data. By combining the benefits of both techniques, AdaBoost+RF can improve prediction performance and is less prone to overfitting.

Through a literature review, we discovered that only 2 studies, that of Menger et al, has applied machine learning techniques to EMR data to predict the risk of aggression in psychiatric inpatients [20]. Menger et al achieved the highest accuracy when they combined Document Embeddings with a Recurrent Neural Network. The AUC of our study (AUC: 0.684) is lower than that obtained in Menger et al (AUC: 0.764–0.797). The differences in the study designs may be responsible for the discrepancy. Menger et al investigated incidents in which patients demonstrated either physical or verbal aggression toward staff or other patients [20]. However, only physical aggression was included in our study. Furthermore, Menger et al included the EMRs of doctors and nurses that were created on the first day of admission, whereas our study included only the EMRs of nurses created on the first day of admission. Including less data in analyses can result in less accurate predictions. In addition, Chinese is a complex language with multiple meanings for the same word, which can make it difficult for text mining algorithms to accurately identify the intended meaning of the text [6769]. Medical terminology in Chinese can be ambiguous, with different terms used to describe similar symptoms or conditions. This ambiguity can cause confusion for text mining algorithms, leading to inaccurate or incomplete results. Furthermore, Chinese EMRs may not be standardized, meaning there is a lack of consistency in how patient information is recorded. This can make it difficult for text mining algorithms to accurately detect violence from the EMR data.

Limitations

In this study, we used text mining of Chinese EMRs to predict violence in psychiatric inpatients. While our results provide some insights into the potential of this approach, there are several limitations to our study that should be discussed. First, the use of text mining to extract data from EMRs may not capture all relevant information on risk factors for violence in psychiatric inpatients. For example, data on whether admission was voluntary, the number of previous admissions, quality of life, family support, intelligence level, and history of sexual abuse, which are known to be associated with the risk of violence in psychiatric wards [4, 5, 819], were not included in our data set. As a result, the accuracy of our predictions may be limited by the absence of these important factors. Second, the accuracy of our predictions may also be limited by the quality and completeness of the EMRs used in our study. It is possible that errors or inconsistencies in documentation could have affected the reliability of our data and our ability to accurately identify risk factors for violence. Third, the use of Chinese EMRs may introduce cultural and linguistic biases that could impact the validity of our predictions. It is important to acknowledge that the cultural and linguistic factors that may influence violence in psychiatric inpatients are complex and may not be fully captured by our text mining approach. Fourth, the EMRs did not clearly specify the methods through which diagnoses were made. Therefore, we could not evaluate the diagnostic accuracy of the psychiatric disorders reported in the EMRs included in our study. Fifth, our study did not include verbal aggression. Therefore, the prevalence of verbal aggression in psychiatric wards and the accuracy of verbal aggression prediction in psychiatric wards using Chinese EMRs warrant further study. Finally, the timeframe of our study may have impacted the accuracy of our predictions, as violence in psychiatric inpatients could be influenced by factors that change over time, such as changes in medication or therapy.

Conclusion and future directions

Violence is a key concern in acute psychiatric wards because it can lead to patient or staff injury and because it is counter-therapeutic. Studies have reported that 75% to 100% of nursing staff who work in acute psychiatric units have experienced patient assault [70, 71]. Aggression toward staff was indicated to contribute to high staff turnover [72]. Given the importance of this problem, predicting which psychiatric inpatients will commit violence is crucial. Therefore, we established a predictive model for violence in psychiatric inpatients by using structured and unstructured data obtained from Chinese EMRs and several machine learning techniques with an acceptable accuracy. The results supported the feasibility of predicting violent incidents in psychiatric wards by using EMR data collected at the time of admission and indicated that such a method might be incorporated into routine clinical practice to enable early prediction of inpatient violence. Our findings may provide clinicians with a new basis for judging violence risk in psychiatric wards and may enable first-line caregivers to implement appropriate treatment and preventive measures for hospitalized patients at high risk of violence, ultimately improving patient outcomes and staff safety.

Future research directions in this field could include incorporating additional variables, such as admission type, previous admissions, intelligence level, and history of sexual abuse, to improve the accuracy of predictive models for violence. Structured interviews could be used to determine psychiatric diagnoses and investigate the association between psychiatric disorders and the risk of violence in inpatients. Future studies could also explore the prevalence of verbal aggression in psychiatric wards and the accuracy of predicting verbal aggression using EMRs. Furthermore, validation of our model on other populations to determine its generalizability and applicability to different contexts is needed. The effectiveness of different machine learning techniques and prediction models could also be compared to identify the most accurate and efficient method for predicting violence in psychiatric inpatients. Moreover, targeted interventions could be developed and implemented to reduce the risk of violence in psychiatric inpatients identified as high-risk by the model. Finally, long-term outcomes of violence in psychiatric inpatients, such as patient outcomes and staff safety, should be examined to determine the impact of early prediction and intervention on patient care and outcomes.

Supporting information

S1 Data. CfsSubsetEval (26)-balanced-final.

https://doi.org/10.1371/journal.pone.0286347.s001

(CSV)

S2 Data. Structured with SBERT (787)-balanced-final.

https://doi.org/10.1371/journal.pone.0286347.s002

(CSV)

S3 Data. Structured with TF-IDF (648)-final.

https://doi.org/10.1371/journal.pone.0286347.s003

(CSV)

References

  1. 1. Friedman RA. Violence and mental illness—how strong is the link? The New England journal of medicine. 2006;355(20):2064–6. Epub 2006/11/17. pmid:17108340.
  2. 2. Fazel S, Långström N, Hjern A, Grann M, Lichtenstein P. Schizophrenia, substance abuse, and violent crime. Jama. 2009;301(19):2016–23. Epub 2009/05/21. pmid:19454640.
  3. 3. Swanson JW. Mental disorder, substance abuse, and community violence: an epidemiological approach. 1994.
  4. 4. Virtanen M, Vahtera J, Batty GD, Tuisku K, Pentti J, Oksanen T, et al. Overcrowding in psychiatric wards and physical assaults on staff: data-linked longitudinal study. The British journal of psychiatry: the journal of mental science. 2011;198(2):149–55. Epub 2011/02/02. pmid:21282786.
  5. 5. Iozzino L, Ferrari C, Large M, Nielssen O, de Girolamo G. Prevalence and Risk Factors of Violence by Psychiatric Acute Inpatients: A Systematic Review and Meta-Analysis. PloS one. 2015;10(6):e0128536. Epub 2015/06/11. pmid:26061796.
  6. 6. Ward L. Mental health nursing and stress: Maintaining balance. International journal of mental health nursing. 2011;20(2):77–85. pmid:21371222
  7. 7. Needham I, Abderhalden C, Halfens RJ, Fischer JE, Dassen T. Non-somatic effects of patient aggression on nurses: a systematic review. Journal of advanced nursing. 2005;49(3):283–96. pmid:15660553
  8. 8. Ramesh T, Igoumenou A, Montes MV, Fazel S. Use of risk assessment instruments to predict violence in forensic psychiatric hospitals: a systematic review and meta-analysis. European psychiatry. 2018;52:47–53. pmid:29626758
  9. 9. Cornaggia CM, Beghi M, Pavone F, Barale F. Aggression in psychiatry wards: a systematic review. Psychiatry research. 2011;189(1):10–20. Epub 2011/01/18. pmid:21236497.
  10. 10. Fletcher A, Crowe M, Manuel J, Foulds J. Comparison of patients’ and staff’s perspectives on the causes of violence and aggression in psychiatric inpatient settings: An integrative review. Journal of Psychiatric and Mental Health Nursing. 2021;28(5):924–39. pmid:33837640
  11. 11. Van Wijk E, Traut A, Julie H. Environmental and nursing-staff factors contributing to aggressive and violent behaviour of patients in mental health facilities. curationis. 2014;37(1):1–9. pmid:25686162
  12. 12. Olsson H, Audulv Å, Strand S, Kristiansen L. Reducing or increasing violence in forensic care: a qualitative study of inpatient experiences. Archives of psychiatric nursing. 2015;29(6):393–400. pmid:26577553
  13. 13. Dack C, Ross J, Papadopoulos C, Stewart D, Bowers L. A review and meta-analysis of the patient factors associated with psychiatric in-patient aggression. Acta Psychiatr Scand. 2013;127(4):255–68. Epub 2013/01/08. pmid:23289890.
  14. 14. Davis S. Violence by psychiatric inpatients: A review. Psychiatric services. 1991;42(6):585–90. pmid:1864567
  15. 15. Carr VJ, Lewin TJ, Sly KA, Conrad AM, Tirupati S, Cohen M, et al. Adverse incidents in acute psychiatric inpatient units: rates, correlates and pressures. Australian & New Zealand Journal of Psychiatry. 2008;42(4):267–82. pmid:18330769
  16. 16. Lam JN, McNiel DE, Binder RL. The relationship between patients’ gender and violence leading to staff injuries. Psychiatric Services. 2000;51(9):1167–70. pmid:10970922
  17. 17. Soliman AE-D, Reza H. Risk factors and correlates of violence among acutely ill adult psychiatric inpatients. Psychiatric services. 2001;52(1):75–80. pmid:11141532
  18. 18. Raja M, Azzoni A. Hostility and violence of acute psychiatric inpatients. Clinical Practice and Epidemiology in Mental Health. 2005;1(1):1–9.
  19. 19. Grassi L, Peron L, Marangoni C, Zanchi P, Vanni A. Characteristics of violent behaviour in acute psychiatric in-patients: a 5-year Italian study. Acta Psychiatrica Scandinavica. 2001;104(4):273–9. pmid:11722302
  20. 20. Menger V, Scheepers F, Spruit M. Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text. Applied Sciences. 2018;8(6):981.
  21. 21. Camus D, Dan Glauser ES, Gholamrezaee M, Gasser J, Moulin V. Factors associated with repetitive violent behavior of psychiatric inpatients. Psychiatry Res. 2021;296:113643. Epub 2020/12/23. pmid:33352415.
  22. 22. Weltens I, Bak M, Verhagen S, Vandenberk E, Domen P, van Amelsvoort T, et al. Aggression on the psychiatric ward: Prevalence and risk factors. A systematic review of the literature. PloS one. 2021;16(10):e0258346. Epub 2021/10/09. pmid:34624057.
  23. 23. McIvor L, Payne-Gill J, Beck A. Associations between violence, self-harm and acute psychiatric service use: Implications for inpatient care. J Psychiatr Ment Health Nurs. 2022. Epub 2022/09/08. pmid:36071316.
  24. 24. Schlup N, Gehri B, Simon M. Prevalence and severity of verbal, physical, and sexual inpatient violence against nurses in Swiss psychiatric hospitals and associated nurse-related characteristics: Cross-sectional multicentre study. Int J Ment Health Nurs. 2021;30(6):1550–63. Epub 2021/07/02. pmid:34196092.
  25. 25. Brown S, O’Rourke S, Schwannauer M. Risk factors for inpatient violence and self-harm in forensic psychiatry: the role of head injury, schizophrenia and substance misuse. Brain injury. 2019;33(3):313–21. Epub 2018/12/07. pmid:30507315.
  26. 26. Menger V, Spruit M, van Est R, Nap E, Scheepers F. Machine Learning Approach to Inpatient Violence Risk Assessment Using Routinely Collected Clinical Notes in Electronic Health Records. JAMA network open. 2019;2(7):e196709. Epub 2019/07/04. pmid:31268542.
  27. 27. Girardi A, Hancock-Johnson E, Thomas C, Wallang PM. Assessing the Risk of Inpatient Violence in Autism Spectrum Disorder. The journal of the American Academy of Psychiatry and the Law. 2019;47(4):427–36. Epub 2019/09/27. pmid:31554646.
  28. 28. Huitema A, Verstegen N, de Vogel V. A Study Into the Severity of Forensic and Civil Inpatient Aggression. Journal of interpersonal violence. 2021;36(11–12):Np6661-np79. Epub 2018/12/12. pmid:30526234.
  29. 29. Fazel S, Toynbee M, Ryland H, Vazquez-Montes M, Al-Taiar H, Wolf A, et al. Modifiable risk factors for inpatient violence in psychiatric hospital: prospective study and prediction model. Psychological medicine. 2021;53(2):1–7. Epub 2021/05/25. pmid:34024292.
  30. 30. Lockertsen Ø, Varvin S, Færden A, Vatnar SKB. Short-term risk assessments in an acute psychiatric inpatient setting: A re-examination of the Brøset Violence Checklist using repeated measurements—Differentiating violence characteristics and gender. Arch Psychiatr Nurs. 2021;35(1):17–26. Epub 2021/02/18. pmid:33593511.
  31. 31. Ægisdóttir S, White MJ, Spengler PM, Maugherman AS, Anderson LA, Cook RS, et al. The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist. 2006;34(3):341–82.
  32. 32. Teo AR, Holley SR, Leary M, McNiel DE. The relationship between level of training and accuracy of violence risk assessment. Psychiatric services. 2012;63(11):1089–94. pmid:22948947
  33. 33. Eaton S, Ghannam M, Hunt N. Prediction of violence on a psychiatric intensive care unit. Medicine, Science and the Law. 2000;40(2):143–6. pmid:10821025
  34. 34. Skeem JL, Monahan J. Current directions in violence risk assessment. Current directions in psychological science. 2011;20(1):38–42.
  35. 35. Daffern M. The predictive validity and practical utility of structured schemes used to assess risk for aggression in psychiatric inpatient settings. Aggression and Violent Behavior. 2007;12(1):116–30.
  36. 36. Douglas KS, Guy LS, Reeves KA, Weir J. HCR-20 violence risk assessment scheme: Overview and annotated bibliography. 2005.
  37. 37. Monahan J, Steadman HJ, Robbins PC, Silver E, Appelbaum PS, Grisso T, et al. Developing a clinically useful actuarial tool for assessing violence risk. The British Journal of Psychiatry. 2000;176(4):312–9. pmid:10827877
  38. 38. Anderson KK, Jenson CE. Violence risk–assessment screening tools for acute care mental health settings: Literature review. Archives of psychiatric nursing. 2019;33(1):112–9. pmid:30663614
  39. 39. Quinsey VL, Harris GT, Rice ME, Cormier CA. Violent offenders: Appraising and managing risk: American Psychological Association; 2006.
  40. 40. Webster C, Douglas K, Eaves D, Hart S. HCR-20: Assessing risk for violence (Version 2). Burnaby, British Columbia, Canada: Mental Health. Law, and Policy Institute, Simon Fraser University. 1997.
  41. 41. Borum R, Lodewijks HP, Bartel PA, Forth AE. The Structured Assessment of Violence Risk in Youth (SAVRY). 2021.
  42. 42. Fazel S, Singh JP, Doll H, Grann M. Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24 827 people: systematic review and meta-analysis. Bmj. 2012;345. pmid:22833604
  43. 43. Mistler LA, Friedman MJ. Instruments for Measuring Violence on Acute Inpatient Psychiatric Units: Review and Recommendations. Psychiatric services (Washington, DC). 2022;73(6):650–7. Epub 2021/09/16. pmid:34521209.
  44. 44. Yang M, Wong SC, Coid J. The efficacy of violence prediction: a meta-analytic comparison of nine risk assessment tools. Psychological bulletin. 2010;136(5):740. pmid:20804235
  45. 45. Gardner W, Lidz CW, Mulvey EP, Shaw EC. A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior. 1996;20(1):35–48.
  46. 46. Viljoen JL, McLachlan K, Vincent GM. Assessing violence risk and psychopathy in juvenile and adult offenders: A survey of clinical practices. Assessment. 2010;17(3):377–95. pmid:20124429
  47. 47. Lee CH, Yoon H-J. Medical big data: promise and challenges. Kidney research and clinical practice. 2017;36(1):3. pmid:28392994
  48. 48. Murdoch TB, Detsky AS. The inevitable application of big data to health care. Jama. 2013;309(13):1351–2. pmid:23549579
  49. 49. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports. 2016;6(1):1–10.
  50. 50. Bjarnadottir RI, Bockting W, Yoon S, Dowding DW. Nurse documentation of sexual orientation and gender identity in home healthcare: a text mining study. CIN: Computers, Informatics, Nursing. 2019;37(4):213–21. pmid:30601189
  51. 51. Hyun S, Cooper C. Application of Text Mining to Nursing Texts: Exploratory Topic Analysis. CIN: Computers, Informatics, Nursing. 2020;38(10):475–82. pmid:33044316
  52. 52. Liao P-H, Chu W, Chu W-C. Evaluation of the mining techniques in constructing a traditional Chinese-language nursing recording system. CIN: Computers, Informatics, Nursing. 2014;32(5):223–31. pmid:24695325
  53. 53. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:190810084. 2019.
  54. 54. Mathuria M. Decision tree analysis on j48 algorithm for data mining. Intrenational Journal ofAdvanced Research in Computer Science and Soft-ware Engineering. 2013;3(6).
  55. 55. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
  56. 56. Bhargava N, Dayma S, Kumar A, Singh P, editors. An approach for classification using simple CART algorithm in WEKA. 2017 11th International Conference on Intelligent Systems and Control (ISCO); 2017: IEEE.
  57. 57. Peterson LE. K-nearest neighbor. Scholarpedia. 2009;4(2):1883.
  58. 58. Akobeng AK. Understanding diagnostic tests 1: sensitivity, specificity and predictive values. Acta paediatrica. 2007;96(3):338–41. pmid:17407452
  59. 59. Fowler JR, Gaughan JP, Ilyas AM. The sensitivity and specificity of ultrasound for the diagnosis of carpal tunnel syndrome: a meta-analysis. Clinical Orthopaedics and Related Research®. 2011;469(4):1089–94. pmid:20963527
  60. 60. Klerx-Van Mierlo F, Bogaerts S. Vulnerability factors in the explanation of workplace aggression: The construction of a theoretical framework. Journal of forensic psychology practice. 2011;11(4):265–92.
  61. 61. Higgins N, Watts D, Bindman J, Slade M, Thornicroft G. Assessing violence risk in general adult psychiatry. Psychiatric Bulletin. 2005;29(4):131–3.
  62. 62. Singh JP, Grann M, Fazel S. A comparative study of violence risk assessment tools: A systematic review and metaregression analysis of 68 studies involving 25,980 participants. Clinical psychology review. 2011;31(3):499–513. pmid:21255891
  63. 63. Lee P-J, Hu Y-H, Lu K-T. Assessing the helpfulness of online hotel reviews: A classification-based approach. Telematics and Informatics. 2018;35(2):436–45.
  64. 64. Lin K, Hu Y, Kong G. Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model. International journal of medical informatics. 2019;125:55–61. pmid:30914181
  65. 65. Cacheda F, Fernandez D, Novoa FJ, Carneiro V. Early detection of depression: social network analysis and random forest techniques. Journal of medical Internet research. 2019;21(6):e12554. pmid:31199323
  66. 66. Hu Y-H, Chen K, Chang I-C, Shen C-C. Critical predictors for the early detection of conversion from unipolar major depressive disorder to bipolar disorder: nationwide population-based retrospective cohort study. JMIR medical informatics. 2020;8(4):e14278. pmid:32242821
  67. 67. Ma J, Xu W, Sun Y-h, Turban E, Wang S, Liu O. An ontology-based text-mining method to cluster proposals for research project selection. IEEE transactions on systems, man, and cybernetics-part a: systems and humans. 2012;42(3):784–90.
  68. 68. Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering. 2018;2018. pmid:29849998
  69. 69. Zhang M-y, Lu Z-d, Zou C-y. A Chinese word segmentation based on language situation in processing ambiguous words. Information Sciences. 2004;162(3–4):275–85.
  70. 70. Hatch-Maillette MA, Scalora MJ, Bader SM, Bornstein BH. A gender-based incidence study of workplace violence in psychiatric and forensic settings. Violence and victims. 2007;22(4):449–62. pmid:17691552
  71. 71. Caldwell MF. Incidence of PTSD among staff victims of patient violence. Psychiatric Services. 1992;43(8):838–9. pmid:1427689
  72. 72. Needham I, Abderhalden C, Halfens RJ, Dassen T, Haug HJ, Fischer JE. The effect of a training course in aggression management on mental health nurses’ perceptions of aggression: a cluster randomised controlled trial. International journal of nursing studies. 2005;42(6):649–55. Epub 2005/06/29. pmid:15982464.