Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Questionnaire-free machine-learning method to predict depressive symptoms among community-dwelling older adults

  • Sri Susanty,

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Resources, Validation, Writing – original draft

    Affiliations School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan, Nursing Study Program, Faculty of Medicine, Universitas Halu Oleo, Kendari, Southeast Sulawesi, Indonesia

  • Herdiantri Sufriyana,

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Software, Visualization, Writing – original draft

    Affiliations Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan, Department of Medical Physiology, Faculty of Medicine, Universitas Nahdlatul Ulama Surabaya, Surabaya, Indonesia

  • Emily Chia-Yu Su ,

    Roles Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Writing – review & editing

    yeuhui@tmu.edu.tw (YHC); emilysu@tmu.edu.tw (ECYS)

    Affiliations Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan, Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan, Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan

  • Yeu-Hui Chuang

    Roles Conceptualization, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    yeuhui@tmu.edu.tw (YHC); emilysu@tmu.edu.tw (ECYS)

    Affiliations School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan, Center for Nursing and Healthcare Research in Clinical Practice Application, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan

Abstract

The 15-item Geriatric Depression Scale (GDS-15) is widely used to screen for depressive symptoms among older populations. This study aimed to develop and validate a questionnaire-free, machine-learning model as an alternative triage test for the GDS-15 among community-dwelling older adults. The best models were the random forest (RF) and deep-insight visible neural network by internal validation, but both performances were undifferentiated by external validation. The AUROC of the RF model was 0.619 (95% CI 0.610 to 0.627) for the external validation set with a non-local ethnic group. Our triage test can allow healthcare professionals to preliminarily screen for depressive symptoms in older adults without using a questionnaire. If the model shows positive results, then the GDS-15 can be used for follow-up measures. This preliminary screening will save a lot of time and energy for healthcare providers and older adults, especially those persons who are illiterate.

Introduction

Depressive symptoms in older adults are commonly unidentified and complicated by concurrent cognitive impairment [1]. To screen depressive symptoms in older adults, the Geriatric Depression Scale (GDS) is one of the most commonly used questionnaires. A recent systematic review and meta-analysis found that the 15-item version (GDS-15) is the most accurate compared to the shorter or longer versions [2]. By questionnaire-free variables, demographic and physical health data from routine visits can be utilized as an electronic health record (EHR) indicator to triage patients for a mental health follow-up by GDS-15. This utilization is possible because older adults with depressive symptoms may present with more physical complaints, implying a psychological change that caregivers might overlook [3]. However, the accuracy of utilizing such data for a triage test is still unclear.

Motivation

Depression affects 264 million people globally [4]. Due to the different tools used for screening depression, the prevalence range is considerably wide [5]. Previous studies differently reported the prevalence rates of depression among community-dwelling older adults in Sweden (7%) [6], the United States (9.8%) [7], Nigeria (52.0%) [8], India (34.4%) [9], Singapore (13%) [10], Turkey (25.2%) [11], Japan (24%) [12], South Korea (72.2%) [13], and Malaysia (59.1%) [14]. Although a definitive one uses the Diagnostic and Statistical Manual of Mental Disorders V (DSM-V), the GDS-15 is quite reliable for diagnosis. Prevalences in Sweden were quite similar between those based on the GDS-15 (7%) and DSM-IV-TR/DSM-V (6.6%) [6].

In addition to feelings of sadness, helplessness, and pessimism, an older adult with this disorder may also experience a decrease in mood, loss of motivation, physical weakness, sleep disturbances, a feeling of hopelessness, a lack of help, and difficulty concentrating [15]. Depression in later life, if not promptly treated, can result in worse outcomes, e.g., a decreased quality of life [16], sleep disturbances [17], attempted suicide [18, 19], and even death [20]. Early identification of depressive symptoms is essential for early interventions.

Previous works

Almost all existing predictive models of depressive symptoms include questionnaire-based predictors, e.g., the Patient Health Questionnaire (PHQ), the Edinburgh Postnatal Depression Scale (EPDS), and the GDS [21]. More-frequent identification of patients with depression (with an area under the receiver operating characteristic [ROC] curve [AUROC] of 0.700, 95% CI 0.629 to 0.771) still needed several questionnaire-based screening tools [22]. They were a part of the Self-Reported Quick Inventory of Depressive Symptomatology (QIDS-SR) and Hamilton Depression Rating Scale (HAM-D). A previous study developed an extended predictD algorithm to predict major depression 12~24 months later based on the DSM-IV (AUROC 0.728, 95% CI 0.675 to 0.781; n = 2670), but this also needs a subject to fill in the 12-Item Short Form (SF-12) for two of the predictors [23].

One study utilized a wearable device to predict GDS-15 and HAM-D results in older adults (AUROC 0.96, 95% CI 0.91 to 0.99; n = 47); unfortunately, the sample size was small, and a wearable device might not be affordable for some older adults [24]. However, no previous study developed a questionnaire-free method to predict depressive symptoms based on standard screening questionnaires in community-dwelling older adults.

Intuition

Later-life (aged 60+) depression is associated with several factors, and their assessments can utilize routine databases at the first visit of a subject to a healthcare facility. Some of these factors never change, i.e., age [25], gender [2628], and past employment status (i.e., before 60 years old) [2931]. A few of these factors rarely change, i.e., current employment status [31, 32], education [33, 34], religion [35, 36], marital status [37], living status [3840], and lifestyle [41]. However, many of these factors can change on a monthly to yearly basis, i.e., health status [4143], morbidities [28], hearing loss [44], and oral health and missing teeth [45]. A prediction model may utilize these factors to develop a triage test for the GDS-15 at any time. At the same time, this test can reduce the screening frequency of GDS-15 by restricting respondents to only those who test positive according to the prediction model. It should be a part of an EHR system with automatic run based on pre-existing, required information in EHR.

However, developing this model under a traditional approach, i.e., using a logistic regression (LR) algorithm, may be insufficient. In addition to LR, we also need other machine learning algorithms, which is a field of science concerned with how machines learn from data [46], not limited to those based on statistical probability theory. Machine learning is a part of artificial intelligence that emulates human intellectual actions [47]. Their use is already pervasive in recognizing objects in images, transcribing speech to text, aligning internet content to user preferences, and selecting relevant search results [48]. Many fields in medicine have used this approach to predict medical outcomes, e.g., oncology [49], cardiology and critical care [50, 51], and obstetrics [52]. Machine learning provides a more extensive search space to find the most-accurate model using simple predictors, e.g., routine data in electronic medical records [49, 53]. This study aimed to develop and validate a questionnaire-free model to predict the GDS-15 among community-dwelling older adults by machine learning.

Methods

Study design

This study followed the guidelines for developing and reporting machine learning predictive models in biomedical research [54] (see S1 Table in S1 File) and the prediction model risk of bias assessment tool (PROBAST) [55] (see S2 Table in S1 File). The PROBAST development was according to transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines [56]. However, the PROBAST included recent findings on developing and validating multivariable prediction models, including one by any machine learning algorithm [57, 58]. For clinicians, we also provided a checklist to assess the suitability of our model for clinical settings [59] (see S3 Table in S1 File). A web application (https://predme.app/pre_gds15) is available as a prototype, but the future implementation should incorporate the application into an EHR system for automatic prediction based on pre-existing information. We utilized a dataset from our previous project investigating loneliness and depression in older adults. From June to September 2019, the previous project collected this dataset using a cross-sectional design from 15 community health centers (CHCs) in Kendari, Indonesia (n = 1381). All patients aged 60 years or older with clear consciousness who visited the CHCs were enrolled. We applied a random sampling technique stratified by the CHCs. Trained assessors who collected the data were blind to the study outcome. Taipei Medical University (TMU) waived ethical clearance for this study. Both the TMU Joint Institutional Review Board (approval no.: N201905105) and the Ethical Research Committee in Universitas Halu Oleo (approval no.: 954/UN.29.20/PPM/2018) granted the original study an ethical clearance. Verbal consent was informed and obtained from the participant.

Data source

The dataset consisted of 19 attributes which were 17 candidate-predictor variables, one grouping variable, and one outcome variable. The candidate-predictor variables were: 1) age (years); 2) gender (male/female); 3) religious beliefs (Christian/Hindu/Moslem); 4) educational attainment (illiterate/primary/secondary/high school/university/other); 5) marital status (single/married/separated or divorced/widowed); 6) children (number of persons); 7) living status (alone/with a family member but no spouse/with a spouse only/with family member and spouse/other); 8) currently employed (no/yes); 9) previously employed (no/yes); 10) income (in Indonesian rupiah (IDR)); 11) duration of visiting the CHC (in the number of years of routine visits); 12) comorbidities (number of conditions); 13) health condition (very good/good/fair/poor/very poor); 14) hearing problems (no/yes); 15) visual problems (no/yes); 16) oral status (very good/good/fair/poor/very poor); and 17) medication (number of prescribed drugs). We used ethnicity (Bugis-Makassar/Buton/Muna/Tolaki/non-local ethnicity) as a grouping variable for data partitioning in order to develop and validate our predictive models (see "Model Validation"). The outcome variable was depressive symptoms (no/yes), as defined in the next section.

Outcome definition

As the predicted outcome, depressive symptoms were assessed based on the GDS. There were 15 questions to obtain a score (which ranged from 0 to 15). Some items give a point if answered positively, while others give a point if answered negatively. If the score exceeds 5, the scale suggests a person has depressive symptoms [60]. A participant answered the questions with the assistance of a trained assessor. The GDS questionnaire is described in S4 Table in S1 File. The trained assessor assisted a participant in filling out the GDS questionnaire. The assessor was also blind to the predictor information. Predictor data were demographic data and routine physical health check results collected at the same time as that for GDS. Other healthcare givers collected the data without knowing the assessment results of depressive symptoms. This blinding avoided outcome leakage. It was also carefully handled for all the analytical procedures, as described after each description of the relevant procedures (see S5 Table in S1 File). The event definition for this prediction task was depressive symptoms based on the GDS-15. However, to comply with the sample size requirement of the model development (see "Predictors"), we treated the outcome with a smaller sample size between positive and negative as the event. Under-diagnosis causes missed cases of depressive symptoms when screened using the GDS-15, which leads to failure to prevent major depressive disorders. Meanwhile, over-diagnosis causes an increased frequency of the use of the GDS-15 for each older adult, which may lead to further misclassification because the repetitive screening may cause response fatigue and rush, which lead to higher measurement error [61]. Nonetheless, the risk of under-diagnosis outweighs that of over-diagnosis.

Data pre-processing

We binarized all categorical predictors into 0 or 1 for "no" or "yes" as to whether a category applied to a participant (Fig 1). All numerical predictors were standardized using the mean and standard deviation (SD) but capped at the 2.5% and 97.5% quantiles as the respective minimum and maximum values. This standardization resulted in a value range of approximately -1.96 to 1.96. Then, we applied normalization by shifting the central value (i.e., zero) to 0.5 and scaling the range by half; thus, the numerical predictors were within a range of 0 to 1. We only used the mean and SD calculated from data partitioned for model development. Standardization used these values for numerical predictors in any data partitions. Therefore, this pre-processing procedure is possible for future data. We checked for missing values in the dataset. The only missing value was that in visual problems for one participant (n = 1/1381, 0.072%). This value was missing completely at random since we got this information from routine physical health check data. We imputed the missing value using multiple imputations by the chain equation method after data transformation using only data in the same data partition. Randomly, the missing value was a part of the data partitioned for model development.

Predictors

We only used data partitioned for model development to conduct predictor extraction, representation, and selection (Fig 1). For candidate predictors, the binarized predictors were extracted only for those without a perfect separation problem in which the predictor existed in only one of the outcomes. Perfect separation may occur because of a sampling error [54]. Although this may also occur in populations, including this kind of predictor may mislead predictive modeling to choose that predictor as the strong one for predicting outcomes. Of 40 predictors after binarization, only 37 were extracted. The excluded predictors were "other" living status, "very poor" oral status, and "Hindu" religion.

We assessed redundant predictors assisted by Pearson’s correlation coefficients. Two binarized predictors were highly correlated (r = 0.72), which were "living with family members without a spouse" and a "widowed" marital status. We decided to retain these variables because the correlation was near borderline and was apparently due to sampling bias. A "widowed" marital status is not necessarily living with family members, while an older adult might live alone. Both predictors were not interchangeable.

To optimize the predictive performance, we applied a dimension-reduction technique using a principal component (PC) analysis (PCA) (Fig 1). We only used the top 19 PCs based on the percent variance explained because we needed to comply with the sample size for predictive modeling based on PROBAST guidelines [58], which is 20 events per variable or candidate of predictors (see "Model Validation"). A ten-fold cross-validation procedure was applied on only data partitioned for model development. We used average values computed from ten rotated matrices of PCs to represent 37 binarized and numerical predictors into 19 PCs. We also used average values of data partitioned for model development to get those PCs for model validation. This study’s resampled dimensional reduction method was already described elsewhere [62].

We also used other machine learning algorithms besides logistic regression to develop prediction models (see "Model Development"). However, the models required larger sample sizes of >50 events per variable [58]. We used the wrapper method in which we selected PCs using a logistic regression before being candidate predictors for the machine learning models (Fig 1). We applied the same hyperparameter tuning strategy of the LR for this predictor selection (see "Model Development").

Model development

Although there are abundant machine learning algorithms for model development, we only partially compared the available algorithms (Fig 1). It is because a more models in comparison would be more vulnerable to a multiple-testing effect relative to the number of datasets, i.e., the best model is found simply by chance [63]. To avoid such comparison, we considered three criteria for choosing algorithms in developing the models: (1) those commonly used in clinical prediction studies, i.e., logistic regression [58], which expects a linear predictor-outcome correlation; (2) those which commonly outperformed others (177 algorithms) across 121 datasets [64], which allow a non-linear predictor-outcome correlation; and (3) our proposed neural-network algorithm [65], which pursues moderate predictive performance and deeper interpretability. A sufficient sample size was also considered according to the PROBAST guidelines since a small sample size was vulnerable to overfitting [58]. The three types of algorithms also covered those with the lowest and highest sample size requirements, which were 20 (i.e., logistic regression) and >200 (i.e., random forest [RF] and neural network) events per variable (EPVs), according to a previous study [66]. They also identified 50 and >200 EPVs for the decision tree and support vector machine. We did not use both, which neither commonly outperformed other algorithms nor required a sample size small enough for this study. Although we used algorithms that require >200 EPVs, we evaluated the models using rigorous data splitting. It would identify overfitting by comparing the evaluation results between internal and external validation sets. Both had the same and different characteristics for a particular circumstance (see Model validation), as recommended by the PROBAST guidelines [58].

In addition, we used a random-search method to tune values for the pre-defined hyperparameters. We also used those which were defined before conducting this study in a pre-registered protocol [65]. The randomness and pre-registration were deliberate to avoid a research bias, so-called “hypothesizing after the results are known (HARking)” [67]. In this study, HARking is a situation in which a set of hyperparameters for an algorithm, as a hypothesis, is preferably defined to achieve the only acceptable predictive performance in an external validation set.

We developed four models with different approaches. First, we applied the simplest model using logistic regression (LR) with a shrinkage method as recommended by the PROBAST guidelines (Fig 1). Instead of the PCs, this model used the 37 candidate predictors by an elastic net regression algorithm with L1- and L2-norm regularization. We chose this regularization method over others to minimize the excluded predictors and prevent overfitting [58]. Hyperparameter tuning of this model used a random search with up to 10 configurations of alpha and lambda values as L1- and L2-norm regularization factors, respectively. We set the factors in the tradeoff between removing and maintaining the number of predictors used for predicting the outcome; thus, we could infer which variables have predictive values under a simple predictive modeling framework.

The second and third prediction models used RF and gradient boosting machine (GBM) algorithms (Fig 1). Both are state-of-the-art algorithms that consistently outperformed other algorithms across different outcomes [68]. The RF algorithm randomly selects some predictors to build multiple classification trees using subsets of samples in parallel. Meanwhile, the GBM sequentially applies a similar algorithm. Sequential application means a later tree in GBM is used to predict misclassification of earlier ones. Both algorithms are the most used competition-winning algorithms for predictions using tabular data compared to other 177 algorithms using 121 datasets [64]. While this is not outcome-specific, predictive modeling in a competition is independently validated; thus, the predictive performances of RF and GBM are considered reliable and reasonably evaluated. Hyperparameter tuning of these models also used a random search over six configurations of the number of predictors sampled at a time for RF and number of trees, maximum depth of a tree, and shrinkage factor for the GBM. Both models also configured for minimum samples per node. We defined these hyperparameter variables in aggregate between the tree-based ensemble learners to pursue a wide range of configurations. For example, we applied a different number of predictors sampled at a time for RF while maintaining the same tree structure. Contrarily, we applied different tree structures for GBM while maintaining the same number of predictors sampled a time. The best hyperparameters were selected for each of the algorithms under a variety of samples per node to take into account the effect of sampling error. Therefore, we expected a hypothesis search of hyperparameters well-covered while avoiding the pitfall of HARking.

The last prediction model used the deep-insight visible neural network (DI-VNN) algorithm (Fig 1). It is a deep-learning model or a convolutional neural network (CNN). This model emerged in recent years because it improves predictive performance for imaging data. The Deep Insight algorithm converts a non-image into image-like data as a multidimensional array in a meaningful way using a dimensional-reduction algorithm over the predictors. The VNN means that the network architecture is data-driven because it is determined based on a hierarchical clustering algorithm over the predictors. This approach addresses criticisms of the CNN as a black-box model, i.e., it is unexplained which features and how these result in a particular prediction; yet, a CNN model can predict an outcome very well. Details about the DI-VNN pipeline were previously described elsewhere [65]. Some modifications of this pipeline were those by applying this procedure over 37 predictors and 19 PCs, resulting in 18 candidate features for DI-VNN. These were centered using each average value after quantile-to-quantile normalization over all features among samples. To avoid HARking, we followed the same hyperparameter tuning approach, which was already pre-registered and thoroughly described elsewhere [65].

Model validation

Data partitioning was conducted to obtain both internal and external validation sets. Respectively, this meant we had a training set and two test sets. We used participants with ethnicity not from Sulawesi Island for the external validation set. The model was expected for use in settings not limited to those with only the local ethnicities. Hence, we should test whether the model developed using data with local ethnicities would also have an acceptable predictive performance if the model was applied to non-local ethnicities. This validation procedure may demonstrate the model’s robustness in predicting outcomes in the general population [58].

We also randomly split the remaining set after excluding the external validation set. This procedure provided another external validation set with as much as ~20% of the remaining set. The first to third models applied 10-fold cross-validation and 30-time bootstrapping. Respectively, both were applied for hyperparameter tuning and model training with the best hyperparameters. We also applied 10-fold cross-validation to compute the rotated matrix of PCs. Meanwhile, the fourth model applied hold-out cross-validation with 80:20 ratios for the training and validation sets. To compare this model against the others, we applied 30-time bootstrapping to compute the predictive performance. To re-calibrate all models using logistic regression, we also applied 30-time bootstrapping.

Evaluation metrics

We used the area under the receiver operating characteristics (ROC) curve (AUROC) as the primary evaluation metric. This selection was because the AUROC is threshold-agnostic. However, before evaluating this, we reported the calibration metric of a model using an LR in which the predicted probability as the model output became the only covariate. The models were well calibrated if the 95% CIs of the intercept and slope respectively covered 0 and 1, with the probability plots visually aligned with the reference line. Models were evaluated with and without re-calibration (see "Model Validation"). We chose all models that complied with the calibration metric. The best models were well-calibrated models that significantly outperformed others, according to the AUROC. All metrics are reported with 95% CI. A model outperformed the others if the interval estimate was greater than the central value of the other models. Otherwise, more than one model might be selected. The best model was determined using the internal validation set. It also should be robust based on all external validation sets, for which the central value of the AUROC approximated >0.5, as a baseline value to determine if a predictive performance of a model was better than random or coin-flip guessing. Compared to the same baseline value, we also computed the specificity, accuracy, positive predictive value (PPV) or precision, and negative predictive value (NPV) using a threshold at approximately a sensitivity or recall of ~90% or a false negative rate of ~10% because the risk of under-diagnosis outweighs that of over-diagnosis. In addition, we explored the best model to identify important features post-analysis (see Results).

Ontology analysis

Our DI-VNN model explore ontological relationships among the predictors in predicting the outcome. A detailed technical explanation of the DI-VNN algorithm was previously described elsewhere [65]. Briefly, there were three steps: (1) differential analysis for feature pre-selection; (2) structural representation of features; and (3) CNN model training.

We applied a differential analysis to choose 18 candidate features for DI-VNN among 37 predictors and 19 PCs (Fig 1). The differential analysis applied quantile-to-quantile normalization, which removed technical inter-variability (i.e., to measure predictors) across the subjects. By t-moderated statistics, a differential analysis selected candidate features (filter method for feature selection). The null hypothesis was that there is no significant difference in a feature value between positives and negatives. Since a predictor could be selected by chance, which posed the analysis to multiple testing bias, we adjusted the p-values using the Benjamini-Hochberg method. We selected a feature if the adjusted p-value or false discovery rate (FDR) was less than 0.05.

After pre-selection, the candidate features without the outcome were used to construct a structural representation of feature variabilities and inter-relationships. There were two types of structural representation (Fig 1): (1) spatial; and (2) hierarchical. We applied the t-distributed stochastic neighbor embedding (t-SNE) algorithm to cluster the selected features in a three-dimensional positioning spatially. A closer position means a higher correlation between a pair of features. Meanwhile, we applied clique-extracted ontology algorithm to cluster the selected features in a hierarchy. Features are more similar to those within the same ontology than those in a different ontology. Since these ontologies were hierarchical, we could evaluate which ontology was more predictive between that with fewer features (i.e., child ontology) and that with more features (i.e., parent ontology) after the model training.

Eventually, we used the representation as a CNN architecture and trained it using a backpropagation algorithm to predict the outcome. In CNN modeling, a maximum value would represent closer values in a multidimensional array. In this way, inter-relationships among features were also taken into account when predicting the outcome in addition to their values. The backpropagation algorithm in a CNN modeling also allowed us to signify which features and their inter-relationships were more weighted to predict the outcome. A more-extreme weight, either positively or negatively, was represented with a higher color intensity when visualizing the internal properties of our DI-VNN model. Therefore, using this ontology analysis, we could evaluate: (1) which set of features (i.e., ontology) were more predictive; (2) how these ontologies were connected; (3) what were important features in an ontology; and (4) how these features were related within an ontology.

Results

Most subjects have not obtained a university education, were not separated/divorced, and are religious believers

We developed four diagnostic prediction models using a cross-sectional dataset (n = 1252). These models were externally validated (n = 129) with non-local ethnic groups unobserved in the development sets (Table 1). Model validation may be challenging since estimates of depressive symptom prevalences in the validation set differed from those of the development sets. However, ethnicity may affect the distributions of the predictors and the outcome to some extent. A prediction model should be robust against the shift of data distribution (i.e., well-generalized). Therefore, our validation sets allowed the generalization test, including data with non-local ethnicities, which would extend our model application to new data with ethnicities different from ours.

thumbnail
Table 1. Most subjects have not obtained a university education, were not separated/divorced, and are religious believers.

https://doi.org/10.1371/journal.pone.0280330.t001

The prevalence of depressive symptoms in older adults differed among ethnic groups (Table 1). The Tolaki ethnic group had the highest prevalence. Prevalences were similar between the Bugis-Makassar and Buton ethnic groups. Only one local ethnic group was similar to those not from Sulawesi Island in terms of the prevalence estimate of depressive symptoms, which was the Muna ethnic group. Both the Bugis-Makassar and the Tolaki ethnic groups were considered the majority of community-dwelling older adults in our dataset.

We only used the training set to develop the models. This procedure would be similar to prediction model developed and validated under different studies. Nevertheless, we needed to identify the characteristics of the dataset we used for training the prediction models (Table 1). Future use of our models will likely benefit those with similar characteristics, particularly in the predictors used in the final model. As intended, we developed our models for older adults aged ≥60 years. This intention characterizes older adults as reasonably having comorbidities and poorer health conditions of hearing, oral status, and visual function, which are considerable compared to younger adults. However, in all those categorical variables (excluding comorbidities), most were in a fair health condition, probably because these subjects had routinely visited a CHC for 9 or 10 years on average. Most of the subjects had not obtained a university education. They had two to six children, were mostly not separated/divorced, most lived with either a spouse or other family members, and were unemployed. Their incomes were considerably low for this country. Most of the subjects, if not all, were religious believers. We saw similar characteristics between GDS-15 positives and negatives, except for the Tolaki (p = 0.012) and Bugis-Makasar ethnic groups (p = 0.48), the number of comorbidities (p = 0.005), the employment status before 60 years of age (p = 0.002), male gender (p = 0.035), a poor health condition (p = 0.016), a living alone status (p = 0.011), and a separated/divorced status (p = 0.042). In addition, to deploy our models, we provided a web application (https://predme.app/pre_gds15) using the best models as a prototype before incorporating the application into an EHR system. Religion options were provided for many religions to keep the application inclusive and avoid inequality. We also used a Big Mac index commonly used to convert income to the same notion in any country [69].

The well-calibrated models were SPC-GBM with re-calibration and DI-VNN without re-calibration

We applied binarization of categorical variables of 17 predictors resulting in 37 predictors without a perfect separation problem in the training set. We used these predictors to develop an LR model with regularization. Because we needed to pursue 20 events per variable, only the top 19 PCs were retained for feature selection by the multivariable LR. These PCs accounted for 81.7% of the variance explained (95% CI 81.68% to 81.72%). Furthermore, we only used seven selected PCs for model development by the RF and GBM algorithms because this allowed us to pursue >50 events per variable. These were the selected PC (SPC)-RF and SPC-GBM models. Meanwhile, of 17 predictors and 37 PCs for the DI-VNN, only 18 had an FDR of <0.05 by the differential analysis using the Benjamini-Hochberg correction. This analysis pre-selected all candidate predictors before being a candidate predictor of the DI-VNN. In the differential analysis, only one variable was used for each analysis. This procedure ensured more than 20 events per variable for each analysis. Subsequently, the Benjamini-Hochberg method corrected the multiple testing effects. We compared calibration metrics and plots of these models with and without re-calibration (Fig 2). Only two models were well-calibrated. These were SPC-GBM with re-calibration (Fig 2B) and the DI-VNN without re-calibration (Fig 2A). The LR model was visually aligned after re-calibration (Fig 2B), but the 95% CI of the calibration intercept did not cover 0. The SPC-RF without re-calibration also did not cover 1 by the 95% CI of the calibration slope. Meanwhile, re-calibrating this model resulted in a dichotomous probability that reduced its clinical utility (Fig 2B). Neither the calibration intercept nor slope of the SPC-GBM without re-calibration (Fig 2A) respectively covered 0 or 1. Unlike this model, the DI-VNN with re-calibration (Fig 2B) fulfilled the intercept and slope criteria but not the calibration plot.

thumbnail
Fig 2. The well-calibrated models were SPC-GBM with re-calibration and DI-VNN without re-calibration.

A. Without re-calibration. B. With re-calibration. DI-VNN, deep-insight visible neural network; LR, logistic regression; SPC-GBM, selected principal components with the gradient boosting machine; SPC-RF, selected principal components with the random forest.

https://doi.org/10.1371/journal.pone.0280330.g002

The best model was the SPC-GBM but undifferentiated from the DI-VNN in external validation

We only used the training set to determine the best from two well-calibrated models: SPC-GBM with re-calibration (Table 2). As observed in this study, the RF and GBM algorithms achieved suitable predictive performances by overfitting the training set. For example, predictive performances of SPC-GBM were reduced by 42.08% and 37.98%, respectively, for point estimates of AUROCs using validation sets with local (0.578, 95% CI 0.572 to 0.583) and non-local (0.619, 95% CI 0.610 to 0.627) ethnicities, compared to those using training set (0.998, 95% CI 0.998 to 0.998). However, the models developed using these algorithms often outperformed models using other algorithms in external validation. That was not always the case in this study. Predictive performances of the SPC-GBM (AUROC of 0.578, 95% CI 0.572 to 0.583; n = 250) were similar to those of the DI-VNN without re-calibration (AUROC of 0.577, 95% CI 0.576 to 0.579; n = 250) in an external validation set with a local ethnic group. In addition, the DI-VNN also showed similar predictive performances among those using training (AUROC of 0.577, 95% CI 0.576 to 0.579; n = 1002) and two test sets, either with a local (AUROC of 0.577, 95% CI 0.576 to 0.579; n = 250) or non-local (AUROC of 0.577, 95% CI 0.576 to 0.579; n = 129) ethnic group. These predictive performances were achieved, although we considered the number of events per variable under different principles from those of other models. In addition, a previous study also applied a questionnaire-free method to predict the GDS-15 in older adults living alone using a wearable device, but the model was considerably overfitting because of a small sample size (AUROC 0.96, 95% CI 0.91 to 0.99; n = 47) [24]. In addition, according to any metrics evaluated in this study, predictive performances of SPC-GBM were better than random or coin-flip guessing (e.g., the AUROC point estimate of SPC-GBM >0.5).

thumbnail
Table 2. The best model was the SPC-GBM but undifferentiated from the DI-VNN in external validation.

https://doi.org/10.1371/journal.pone.0280330.t002

A low education but literate and living alone was predictive in the SPC-GBM while living alone with significant life events, religion, and family support were predictive in the DI-VNN

Using both models, we could identify how important the predictors are in predicting the GDS-15. There were seven PCs in the SPC-GBM. They were latent variables that represented the 37 predictors but with different weights. Details are described elsewhere on how the weights were inferred [62]. We visualized the absolute values of these weights for each selected PC (Fig 3). Absolute values were used because the positive/negative values cannot be interpreted straightforwardly, regardless of whether these tend to be events or non-events. By observing the visualization, we could infer the meaning of the latent variables. These were named based on the higher absolute values by referring to particular predictors.

thumbnail
Fig 3. A low education but literate and living alone was predictive in the SPC-GBM while living alone with significant life events, religion, and family support were predictive in the DI-VNN.

A. Not selected by the DI-VNN. B. Selected by the DI-VNN. CHC, community health center; DI-VNN, deep-insight visible neural network; IDR, Indonesian Rupiah; LR, logistic regression; PC, principal component; SPC-GBM, selected PCs with the gradient boosting machine.

https://doi.org/10.1371/journal.pone.0280330.g003

The most important PC in the SPC-GBM was PC11 (education and living status). In this PC, older adults with a low education but literate and living alone tended to be predictive. The other most important PCs were PC4, PC8, and PC10, which implied religious perceptions, educational perceptions, and current employment status on health. We should have described religion explicitly to maintain our prediction models’ inclusiveness. Education also contributes to PC10. Both PC8 and PC10 also had larger weights on the oral status. Less important predictors were PC16 (very poor hearing), PC14 (very poor health and others), and PC18 (unknown). The last PC has sporadic, slightly weighted predictors.

The DI-VNN also selected PC4 and PC18 as features (Fig 3). There were original predictors selected in this model, which were religion A (F1) or Z (F3), poor (F2) or good (F10) health conditions, living alone (F6) or with family members but without a spouse (F8), a separated/divorced marital status (F7), a previously employed status (F9), medications (F4), and comorbidities (F5). Beyond PC4 and PC18, there was PC5 (health problems). It was related to comorbidities (F5) and medications (F4). PC4 was also reinforced by PC37 (religion), with less involvement in the health aspect. Poor-health medication (PC28) was also selected with larger weights on the selected predictors, which were a poor health condition (F2) and medications (F4), and the deselected ones, which were education and income. PC27 and PC26 were the previous employment status (F9), but PC27 also had larger weights on age, hearing problems, and the number of children. The last PC21 had larger weights on several predictors related to family support of health.

Living alone with significant life events was positively predictive in the DI-VNN but the opposite if believing in a religion that attracts family activities

While PCs in the SPC-GBM were independently interpreted, those could be interconnected in the DI-VNN (Fig 4). It also included the predictors of origin. Each ontology predicted an outcome in the DI-VNN, contributing to the optimization of the predictive performance. If we used the model architecture up to each ontology for predicting the outcome, different AUROCs were shown (Fig 4A). The top three highest AUROCs were those predicted up to the root, ONT:20, and ONT:22. Each ontology was visualized for the array difference between GDS-15 positives and negatives (Fig 3B). Those for the negatives subtract the weighted features for GDS-15 positives. Positive and negative results from this subtraction referred to GDS-15 positive and negative predictions. Details are described elsewhere on how each ontology prediction was taken into the final prediction and which layers were used for feature visualization [65].

thumbnail
Fig 4. Living alone with significant life events was positively predictive in the DI-VNN but the opposite if believing in a religion that attracts family activities.

A. Ontology network. B. Ontology array. The number under the ontology name in Fig 4B is the area under the receiver operating characteristics curve (AUROC). Please see Fig 3 for the annotation of predictors in Fig 4B. DI-VNN, deep-insight visible neural network; GDS-15, 15-item geriatric depression scale; ONT, ontology; z, channel.

https://doi.org/10.1371/journal.pone.0280330.g004

In the root ontology array, subjects living alone (F6) with comorbidities (F5) and multiple factors (PC18, unknown) tended to be predicted as GDS-15 positives. By tracing through ONT:25 and ONT:22, the PC18 factors were closer to the separated/divorce marital status (F7). Health problems (PC5) in ONT:20 were also closer to the religious perception of health (PC4) in the parent ontology, which was ONT:24. Subjects with PC5 and PC4 tended to be predicted as GDS-15 negatives. Similar predictions assigned subjects with family support (PC21) on poor health conditions (F2), as shown by ONT:23. This was related to religion A (F1) in ONT:19 that was connected to ONT:24 with PC21, F2, and F5 (comorbidities). The last feature in ONT:24 had an opposite tendency on the GDS-15 outcome with the same feature in the root ontology.

Discussion

In this study, we developed four machine-learning models to predict GDS-15 results among community-dwelling older adults. Experimental results demonstrated the feasibility of our approach of applying a questionnaire-free method for developing a triage test for the GDS-15 based on routine data from CHCs. The predictive performances were validated using random and non-random data partitioning, but we only used the training set to develop the models. The validation allowed model generalization to a non-local ethnic group for the SPC-GBM and DI-VNN models.

From 37 PCs, we found seven PCs with top absolute weights, which contributed to the prediction using the SPC-GBM with re-calibration. In comparison, 10 original predictors and eight PCs contributed to the prediction using the DI-VNN without re-calibration. A web application is provided using both the SPC-GBM and DI-VNN, but the latter model was used for individual exploration of either protective or risk factors. It is because the DI-VNN has a deeper exploration capability. In this study, the AUROC of the DI-VNN was very similar to that of the SPC-GBM. This finding may reveal insight into precise behavioral interventions: (1) to prevent depressive symptoms from turning into major depressive disorders; or (2) to mitigate further progression of this disorder. Note that our system should also be automatic in recommending a GDS-15 evaluation. Manual input by clinicians considerably cancels out the objective of this predictive system.

SPC-GBM has demonstrated moderate sensitivity and specificity based on external validation with either local or non-local ethnicity. Among individuals experiencing depressive symptoms (i.e., positives), an incorrect prediction (i.e., a negative) may cause an individual to be undiagnosed. Hence, a predicted negative should be confirmed by DI-VNN, which demonstrated high sensitivity by external validation. Contrarily, among individuals without depressive symptoms, this may cause overdiagnosis. However, the false positives will be screened by GDS-15 instead of being a definitive diagnosis. Nonetheless, using the baseline value, the predictive performance of SPC-GBM was better than random or coin-flip guessing for any metrics evaluated in this study.

Several findings in our study were in line with previous studies. These included education [33, 34], living status [3840], religion [35, 36], previous employment [2931], health status [4143], and current employment [31, 32]. In PC11, older adults with low education but literate and living alone were strong predictors of GDS-15 positives. However, our findings contradict previous findings that reported education was negatively correlated with depressive symptoms (Xin and Ren, 2020) [70].

The second predictor was religious perceptions of health in PC4, with important predictors having a fair oral health status, a fair health condition, and religious beliefs. Depressive symptoms were associated with religiosity based on these PCs. This finding is in line with those of previous studies, which showed that the severity of depression increased with a higher number of missing teeth, the number of decayed teeth, and oral dryness [45]. In addition, religious beliefs were among the important variables in our prediction models. Faithful religious believers have lower levels of depression than non-believers [36].

The educational perception of health (PC8) was also an important predictor of depressive symptoms, which consisted of a very good oral health status, having a primary education and being illiterate, having poor/very good health, and the duration of routine visits to the CHC. The frequency of regular visits to the CHC in this study might have promoted good health in these older adults. CHCs are the first-line promoter of community health, and this seems to be protective against depression. However, depression was also associated with the length of stay, outpatient and inpatient costs, and increasing use of any healthcare facility, including outpatient visits [43].

Another important variable was PC10, which consisted of poor oral health status, very good health conditions, current employment, and higher education. Unemployed individuals and individuals who moved from permanent to precarious employment had an increased risk of clinically relevant depression [32, 71]. Nevertheless, among older people who work, depression can also cause job loss [30]. Therefore, the predictive value of this latent variable may be either a cause or an effect of depressive symptoms.

Very poor hearing in PC16 was also important for predicting depressive symptoms. It is reasonable that hearing problems and very poor health conditions would increase the risk of depressive symptoms. Hearing loss is the third most frequent chronic health problem among older adults and can affect health conditions [72, 73]. The low health conditions of PC14 were also the same as those of PC5 with health problems of comorbidities and medications (PC28). Lastly, we found other important variables in the DI-VNN model: 1) family support of health (PC21) with predictors of the oral status and visual problems, health conditions and income, education, employment, and gender; 2) living (F6, F8) and marital status (F7); and 3) comorbidities (F5) and medications (F4). Income was also a determinant factor of depression in outpatient care in hospitals in Indonesia [34].

In conclusion, the best prediction models were the SPC-GBM and DI-VNN models. One can use these models in our web application to screen for depressive symptoms along with the GDS-15 at any time. If deemed positive, according our models, an older adult is only then asked to answer questions in the GDS-15. This workflow allows for more-frequent screening and may help detect depressive symptoms at any time. Since later-life depression often causes multiple physical symptoms, we would expect reduced unnecessary costs for related diagnostic procedures and interventions. However, future studies are needed to confirm the impacts of our models in improving both the detection and early intervention of older adults with depression.

Limitations of the study

This study has several limitations. An older adult who is an atheist or believes in religion beyond those in our dataset might not be well-predicted. The Big Mac index perceives income as a notion of primary need, which is food, while depressing problems related to income may manifest as different notions. Populations with similar characteristics to those in our training set are warranted to use our prediction models. The predictive performance may differ if older adults have high education, are single, have previous employment, have a job, and have no religious beliefs. More-similar characteristics to our target population would lead to more-optimal predictive performance.

Although the SPC-GBM with re-calibration had the best performance in the internal validation set among the well-calibrated models, the performances were undifferentiated in the external validation set with local ethnicity compared to the DI-VNN without re-calibration. Nonetheless, we only used the internal validation set to choose the best model. It is because choosing the best model by the external validation set might lead to an optimistic bias or overfitting; instead, external validation sets were used for a robustness test of the performance of the prediction models [58]. Eventually, despite the model’s reliability demonstrated in the paper using external validation, one should still not assume generalizability for any other population with different characteristics. External validation is still required for such population. Yet, this is a general issue in prediction studies, not limited to our study.

Inclusion and diversity

We worked to ensure gender balance in the recruitment of human subjects. We worked to ensure ethnic or other types of diversity in the recruitment of human subjects. We worked to ensure that the study questionnaires were prepared in an inclusive way. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. The author list of this paper includes contributors from the location where the research was conducted who participated in the data collection, design, analysis, and/or interpretation of the work.

Supporting information

S1 File. Checklists and questionnaire.

This file consists of: (1) S1 Table. Guidelines for developing and reporting machine learning predictive models in biomedical research; (2) S2 Table. Prediction model risk of bias assessment tools (PROBAST); (3) S3 Table. Clinical checklists for assessing suitability of machine learning applications in healthcare; and (4) S4 Table. The 15-item Geriatric Depression Scale (GDS-15) questionnaire.

https://doi.org/10.1371/journal.pone.0280330.s001

(DOCX)

References

  1. 1. Vieira ER, Brown E, Raue P. Depression in older adults: screening and referral. J Geriatr Phys Ther 2014 Jan-Mar;37(1):24–30. pmid:23619921
  2. 2. Krishnamoorthy Y, Rajaa S, Rehman T. Diagnostic accuracy of various forms of geriatric depression scale for screening of depression among older adults: Systematic review and meta-analysis. Arch Gerontol Geriatr 2020 Mar-Apr;87:104002. pmid:31881393
  3. 3. Kok RM, Reynolds CF 3rd. Management of Depression in Older Adults: A Review. Jama 2017 May 23;317(20):2114–2122. pmid:28535241
  4. 4. World Health Organization. Mental health of older adults. URL: https://www.who.int/news-room/fact-sheets/detail/mental-health-of-older-adults [accessed June 10th].
  5. 5. Zhang Y, Chen Y, Ma L. Depression and cardiovascular disease in elderly: Current understanding. J Clin Neurosci 2018 Jan;47:1–5. pmid:29066229
  6. 6. Sjöberg L, Karlsson B, Atti AR, Skoog I, Fratiglioni L, Wang HX. Prevalence of depression: Comparisons of different depression definitions in population-based samples of older adults. J Affect Disord 2017 Oct 15;221:123–131. pmid:28645024
  7. 7. Brooks JM, Titus AJ, Bruce ML, Orzechowski NM, Mackenzie TA, Bartels SJ, et al. Depression and Handgrip Strength Among U.S. Adults Aged 60 Years and Older from NHANES 2011–2014. J Nutr Health Aging 2018;22(8):938–943. PMC6168750 pmid:30272097
  8. 8. Igbokwe CC, Ejeh VJ, Agbaje OS, Umoke PIC, Iweama CN, Ozoemena EL. Prevalence of loneliness and association with depressive and anxiety symptoms among retirees in Northcentral Nigeria: a cross-sectional study. BMC Geriatr 2020 Apr 23;20(1):153. PMC7178938 pmid:32326891
  9. 9. Pilania M, Yadav V, Bairwa M, Behera P, Gupta SD, Khurana H, et al. Prevalence of depression among the elderly (60 years and above) population in India, 1997–2016: a systematic review and meta-analysis. BMC Public Health 2019 2019/06/27;19(1):832. pmid:31248394
  10. 10. Feng L, Yap KB, Ng TP. Depressive symptoms in older adults with chronic kidney disease: mortality, quality of life outcomes, and correlates. Am J Geriatr Psychiatry 2013 Jun;21(6):570–579. pmid:23567405
  11. 11. Kilavuz A, Meseri R, Savas S, Simsek H, Sahin S, Bicakli DH, et al. Association of sarcopenia with depressive symptoms and functional status among ambulatory community-dwelling elderly. Arch Gerontol Geriatr 2018 May-Jun;76:196–201. pmid:29550658
  12. 12. Kitagaki K, Murata S, Tsuboi Y, Isa T, Ono R. Relationship between exercise capacity and depressive symptoms in community-dwelling older adults. Arch Gerontol Geriatr 2020 Jul-Aug;89:104084. pmid:32388071
  13. 13. Kim K, Lee M. Depressive Symptoms of Older Adults Living Alone: The Role of Community Characteristics. Int J Aging Hum Dev 2015 Mar;80(3):248–263. pmid:26195500
  14. 14. Leong OS, Ghazali S, Hussin EOD, Lam SK, Japar S, Geok SK, et al. Depression among older adults in Malaysian daycare centres. Br J Community Nurs 2020 Feb 2;25(2):84–90. pmid:32040358
  15. 15. Townsend MC, Morgan KI. Psychiatric Mental Health Nursing: Concepts of Care in Evidence-based Practice. Philadelphia, PA: F.A. Davis Company; 2018. ISBN: 9780803660540
  16. 16. Shumye S, Belayneh Z, Mengistu N. Health related quality of life and its correlates among people with depression attending outpatient department in Ethiopia: a cross sectional study. Health Qual Life Outcomes 2019 Nov 8;17(1):169. PMC6839081 pmid:31703701
  17. 17. Peters van Neijenhof RJG, van Duijn E, Comijs HC, van den Berg JF, de Waal MWM, Oude Voshaar RC, et al. Correlates of sleep disturbances in depressed older persons: the Netherlands study of depression in older persons (NESDO). Aging Ment Health 2018 Feb;22(2):233–238. pmid:27827534
  18. 18. Hawton K, Casañas ICC, Haw C, Saunders K. Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord 2013 May;147(1–3):17–28. pmid:23411024
  19. 19. Oliffe JL, Rossnagel E, Seidler ZE, Kealy D, Ogrodniczuk JS, Rice SM. Men’s Depression and Suicide. Curr Psychiatry Rep 2019 Sep 14;21(10):103. pmid:31522267
  20. 20. Abrams RC, Alexopoulos GS. Vascular depression and the death of Queen Victoria. Int J Geriatr Psychiatry 2018 Dec;33(12):1556–1561. pmid:30276875
  21. 21. Smithson S, Pignone MP. Screening Adults for Depression in Primary Care. Med Clin North Am 2017 Jul;101(4):807–821. pmid:28577628
  22. 22. Chekroud AM, Zotti RJ, Shehzad Z, Gueorguieva R, Johnson MK, Trivedi MH, et al. Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 2016 Mar;3(3):243–250. pmid:26803397
  23. 23. King M, Bottomley C, Bellón-Saameño J, Torres-Gonzalez F, Svab I, Rotar D, et al. Predicting onset of major depression in general practice attendees in Europe: extending the application of the predictD risk algorithm from 12 to 24 months. Psychol Med 2013 Sep;43(9):1929–1939. pmid:23286278
  24. 24. Kim H, Lee S, Lee S, Hong S, Kang H, Kim N. Depression Prediction by Using Ecological Momentary Assessment, Actiwatch Data, and Machine Learning: Observational Study on Older Adults Living Alone. JMIR Mhealth Uhealth 2019 Oct 16;7(10):e14149. PMC6913579 pmid:31621642
  25. 25. Vanoh D, Shahar S, Yahya HM, Hamid TA. Prevalence and Determinants of Depressive Disorders among Community-dwelling Older Adults: Findings from the Towards Useful Aging Study. International Journal of Gerontology 2016 2016/06/01/;10(2):81–85. https://doi.org/10.1016/j.ijge.2016.02.001
  26. 26. Albert PR. Why is depression more prevalent in women? J Psychiatry Neurosci 2015 Jul;40(4):219–221. PMC4478054 pmid:26107348
  27. 27. Girgus JS, Yang K, Ferri CV. The Gender Difference in Depression: Are Elderly Women at Greater Risk for Depression Than Elderly Men? Geriatrics (Basel) 2017 Nov 15;2(4). PMC6371140 pmid:31011045
  28. 28. Pilania M, Bairwa M, Khurana H, Kumar N. Prevalence and Predictors of Depression in Community-Dwelling Elderly in Rural Haryana, India. Indian J Community Med 2017 Jan-Mar;42(1):13–18. PMC5348997 pmid:28331248
  29. 29. Dempsey S, Devine MT, Gillespie T, Lyons S, Nolan A. Coastal blue space and depression in older adults. Health Place 2018 Nov;54:110–117. pmid:30261351
  30. 30. Mandal B, Ayyagari P, Gallo WT. Job loss and depression: the role of subjective expectations. Soc Sci Med 2011 Feb;72(4):576–583. PMC3684950 pmid:21183267
  31. 31. Park H, Hwangbo Y, Lee YJ, Jang EC, Han W. Employment and occupation effects on late-life depressive symptoms among older Koreans: a cross-sectional population survey. Ann Occup Environ Med 2016;28:22. PMC4867082 pmid:27182442
  32. 32. Yoo KB, Park EC, Jang SY, Kwon JA, Kim SJ, Cho KH, et al. Association between employment status change and depression in Korean adults. BMJ Open 2016 Mar 1;6(3):e008570. PMC4785295 pmid:26932136
  33. 33. Lee J, Park H, Chey J. Education as a Protective Factor Moderating the Effect of Depression on Memory Impairment in Elderly Women. Psychiatry Investig 2018 Jan;15(1):70–77. PMC5795034 pmid:29422928
  34. 34. Mumang AA, Liaury K, Syamsuddin S, Maria IL, Tanra AJ, Ishida T, et al. Socio-economic-demographic determinants of depression in Indonesia: A hospital-based study. PLoS One 2020;15(12):e0244108. PMC7737985 pmid:33320917
  35. 35. Braam AW, Koenig HG. Religion, spirituality and depression in prospective studies: A systematic review. J Affect Disord 2019 Oct 1;257:428–438. pmid:31326688
  36. 36. Hayward RD, Owen AD, Koenig HG, Steffens DC, Payne ME. Religion and the presence and severity of depression in older adults. Am J Geriatr Psychiatry 2012 Feb;20(2):188–192. PMC3266521 pmid:22273738
  37. 37. Yan XY, Huang SM, Huang CQ, Wu WH, Qin Y. Marital status and risk for late life depression: a meta-analysis of the published literature. J Int Med Res 2011;39(4):1142–1154. pmid:21986116
  38. 38. López-Lopez A, González JL, Alonso-Fernández M, Cuidad N, Matías B. Pain and symptoms of depression in older adults living in community and in nursing homes: the role of activity restriction as a potential mediator and moderator. Int Psychogeriatr 2014 Oct;26(10):1679–1691. pmid:24967598
  39. 39. Martin W, Yani A, Dayati R. Differences of Correlation Factors of Depression Among The Senior Citizens who Live with Their Family and Those who Live in Nursing Home. 2018.
  40. 40. Stahl ST, Beach SR, Musa D, Schulz R. Living alone and depression: the modifying role of the perceived neighborhood environment. Aging Ment Health 2017 Oct;21(10):1065–1071. PMC5161727 pmid:27267633
  41. 41. Tanaka H, Sasazawa Y, Suzuki S, Nakazawa M, Koyama H. Health status and lifestyle factors as predictors of depression in middle-aged and elderly Japanese adults: a seven-year follow-up of the Komo-Ise cohort study. BMC Psychiatry 2011 Feb 7;11:20. PMC3041738 pmid:21294921
  42. 42. Chang-Quan H, Xue-Mei Z, Bi-Rong D, Zhen-Chan L, Ji-Rong Y, Qing-Xiu L. Health status and risk for depression among the elderly: a meta-analysis of published literature. Age Ageing 2010 Jan;39(1):23–30. pmid:19903775
  43. 43. Liu J, Wei W, Peng Q, Guo Y. How Does Perceived Health Status Affect Depression in Older Adults? Roles of Attitude toward Aging and Social Support. Clin Gerontol 2021 Mar-Apr;44(2):169–180.
  44. 44. Lawrence BJ, Jayakody DMP, Bennett RJ, Eikelboom RH, Gasson N, Friedland PL. Hearing Loss and Depression in Older Adults: A Systematic Review and Meta-analysis. Gerontologist 2020 Apr 2;60(3):e137–e154. pmid:30835787
  45. 45. Skośkiewicz-Malinowska K, Malicka B, Ziętek M, Kaczmarek U. Oral health condition and occurrence of depression in the elderly. Medicine (Baltimore) 2018 Oct;97(41):e12490. PMC6203496 pmid:30313038
  46. 46. Deo RC. Machine Learning in Medicine. Circulation 2015 Nov 17;132(20):1920–1930. PMC5831252 pmid:26572668
  47. 47. Mintz Y, Brodie R. Introduction to artificial intelligence in medicine. Minim Invasive Ther Allied Technol 2019 Apr;28(2):73–81. pmid:30810430
  48. 48. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015 May 28;521(7553):436–444. pmid:26017442
  49. 49. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng 2018 Oct;2(10):719–731. pmid:31015651
  50. 50. Knaus WA, Zimmerman JE, Wagner DP, Draper EA, Lawrence DE. APACHE-acute physiology and chronic health evaluation: a physiologically based classification system. Crit Care Med 1981 Aug;9(8):591–597. pmid:7261642
  51. 51. Wijeysundera DN, Karkouti K, Dupuis JY, Rao V, Chan CT, Granton JT, et al. Derivation and validation of a simplified predictive index for renal replacement therapy after cardiac surgery. Jama 2007 Apr 25;297(16):1801–1809. pmid:17456822
  52. 52. Sufriyana H, Wu YW, Su EC. Prediction of Preeclampsia and Intrauterine Growth Restriction: Development of Machine Learning Models on a Prospective Cohort. JMIR Med Inform 2020 May 18;8(5):e15411. PMC7265111 pmid:32348266
  53. 53. Sufriyana H, Wu YW, Su EC. Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia. EBioMedicine 2020 Apr;54:102710. PMC7152721 pmid:32283530
  54. 54. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res 2016 Dec 16;18(12):e323. PMC5238707 pmid:27986644
  55. 55. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med 2019 Jan 1;170(1):51–58. pmid:30596875
  56. 56. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Bmj 2015 Jan 7;350:g7594. pmid:25569120
  57. 57. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015 Jan 6;162(1):W1–73. pmid:25560730
  58. 58. Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med 2019 Jan 1;170(1):W1–w33. pmid:30596876
  59. 59. Scott I, Carter S, Coiera E. Clinician checklist for assessing suitability of machine learning applications in healthcare. BMJ Health Care Inform 2021 Feb;28(1). PMC7871244 pmid:33547086
  60. 60. Friedman B, Heisel MJ, Delavan RL. Psychometric properties of the 15-item geriatric depression scale in functionally impaired, cognitively intact, community-dwelling elderly primary care patients. J Am Geriatr Soc 2005 Sep;53(9):1570–1576. pmid:16137289
  61. 61. Egleston BL, Miller SM, Meropol NJ. The impact of misclassification due to survey response fatigue on estimation and identifiability of treatment effects. Stat Med 2011 Dec 30;30(30):3560–3572. PMC3552436 pmid:21953305
  62. 62. Sufriyana H, Wu YW, Su ECY. Resampled dimensional reduction for feature representation in machine learning. Protocol Exchange 2021;rs.3.pex-1636/v1.
  63. 63. Westphal M, Zapf A, Brannath W. A multiple testing framework for diagnostic accuracy studies with co-primary endpoints. Stat Med 2022 Feb 28;41(5):891–909. pmid:35075684
  64. 64. Fernandez-Delgado M, Cernadas E, Barro S, Amorim D. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J Mach Learn Res 2014;15(90):3133–3181. URL: https://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf
  65. 65. Sufriyana H, Wu YW, Su EC. Deep-insight visible neural network (DI-VNN) for improving interpretability of a non-image deep learning model by data-driven ontology. Protocol Exchange 2021;rs.3.pex-1637/v1.
  66. 66. van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol 2014 Dec 22;14:137. PMC4289553 pmid:25532820
  67. 67. Rubin M. When does HARKing hurt? Identifying when different types of undisclosed post hoc hypothesizing harm scientific progress. Review of General Psychology 2017;21(4):308–320.
  68. 68. Sufriyana H, Husnayain A, Chen YL, Kuo CY, Singh O, Yeh TY, et al. Comparison of Multivariable Logistic Regression and Other Machine Learning Algorithms for Prognostic Prediction Studies in Pregnancy Care: Systematic Review and Meta-Analysis. JMIR Med Inform 2020 Nov 17;8(11):e16503. PMC7708089 pmid:33200995
  69. 69. The Economist. The Big Mac index. URL: https://www.economist.com/big-mac-index [accessed Jan 12th].
  70. 70. Xin Y, Ren X. Social Capital as a Mediator through the Effect of Education on Depression and Obesity among the Elderly in China. Int J Environ Res Public Health 2020 Jun 4;17(11). PMC7312359 pmid:32512694
  71. 71. Hong JW, Noh JH, Kim DJ. The prevalence of and factors associated with depressive symptoms in the Korean adults: the 2014 and 2016 Korea National Health and Nutrition Examination Survey. Soc Psychiatry Psychiatr Epidemiol 2021 Apr;56(4):659–670. pmid:32780175
  72. 72. Amieva H, Ouvrard C, Meillon C, Rullier L, Dartigues JF. Death, Depression, Disability, and Dementia Associated With Self-reported Hearing Problems: A 25-Year Study. J Gerontol A Biol Sci Med Sci 2018 Sep 11;73(10):1383–1389. pmid:29304204
  73. 73. Cosh S, Helmer C, Delcourt C, Robins TG, Tully PJ. Depression in elderly patients with hearing loss: current perspectives. Clin Interv Aging 2019;14:1471–1480. PMC6698612 pmid:31616138