Machine learning for the detection and diagnosis of cognitive impairment in Parkinson’s Disease: A systematic review

Callum Altham; Huaizhong Zhang; Ella Pereira

doi:10.1371/journal.pone.0303644

Abstract

Background

Parkinson’s Disease is the second most common neurological disease in over 60s. Cognitive impairment is a major clinical symptom, with risk of severe dysfunction up to 20 years post-diagnosis. Processes for detection and diagnosis of cognitive impairments are not sufficient to predict decline at an early stage for significant impact. Ageing populations, neurologist shortages and subjective interpretations reduce the effectiveness of decisions and diagnoses. Researchers are now utilising machine learning for detection and diagnosis of cognitive impairment based on symptom presentation and clinical investigation. This work aims to provide an overview of published studies applying machine learning to detecting and diagnosing cognitive impairment, evaluate the feasibility of implemented methods, their impacts, and provide suitable recommendations for methods, modalities and outcomes.

Methods

To provide an overview of the machine learning techniques, data sources and modalities used for detection and diagnosis of cognitive impairment in Parkinson’s Disease, we conducted a review of studies published on the PubMed, IEEE Xplore, Scopus and ScienceDirect databases. 70 studies were included in this review, with the most relevant information extracted from each. From each study, strategy, modalities, sources, methods and outcomes were extracted.

Results

Literatures demonstrate that machine learning techniques have potential to provide considerable insight into investigation of cognitive impairment in Parkinson’s Disease. Our review demonstrates the versatility of machine learning in analysing a wide range of different modalities for the detection and diagnosis of cognitive impairment in Parkinson’s Disease, including imaging, EEG, speech and more, yielding notable diagnostic accuracy.

Conclusions

Machine learning based interventions have the potential to glean meaningful insight from data, and may offer non-invasive means of enhancing cognitive impairment assessment, providing clear and formidable potential for implementation of machine learning into clinical practice.

Citation: Altham C, Zhang H, Pereira E (2024) Machine learning for the detection and diagnosis of cognitive impairment in Parkinson’s Disease: A systematic review. PLoS ONE 19(5): e0303644. https://doi.org/10.1371/journal.pone.0303644

Editor: Farzin Hajebrahimi, New Jersey Institute of Technology, UNITED STATES

Received: March 13, 2024; Accepted: April 29, 2024; Published: May 16, 2024

Copyright: © 2024 Altham et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Parkinson’s Disease (PD) is the most common neurodegenerative disorder [1], characterised by motor and non-motor symptoms including dyskinesia, tremors and balance issues [2]. Over 145K people in the UK are estimated to be living with PD [3], making PD the second most common neurological disease in individuals over the age of 60. PD has an estimated global prevalence rate of 1%, doubling global PD populations between 1990 and 2016 making PD the fastest-growing neurodegenerative condition in the world [4–6].

PD sufferers have a higher risk of developing severe cognitive complications resulting in consistent and damaging cognitive impairments (CI) giving rise to a noticeable loss in cognitive functioning and behavioural abilities, and can lead to the development of overall cognitive decline characteristic of dementia, known as Parkinson’s Disease Dementia (PDD) [7]. Between 70–95% of PD patients are likely to experience some degree of CI as PD advances, with PDD frequently developing 10–20 years post diagnosis [8, 9], with potential for severe impacts on overall quality of life, familial relationships and societal functioning [10]. Treatment and disease management create severe burdens at medical [11], economic [12, 13] and personal levels, with identification of a direct, specific cause for CI development remaining a working, disputed research area [14, 15]. Presentation of CI is complex and diverse, occurring across a number of cognitive domains [16], including visuospatial [17], working memory [18] and psycho-motor speed [19]. Patients also experience widespread variations in onset, severity and progression [16], with diagnosis commonly carried out using clinician-led assessments of cognitive and processing ability to identify at least one dementia syndrome within established PD [8].

Several assessments are available to assess the entire spectrum of cognitive abilities, including the Benton Judgement of Line Orientation (JoLO) [17], Letter-Number Sequencing Task (LNST) [18], Symbol Digit Modalities Test (SDMT) [19] and Montreal Cognitive Assessment (MoCA) [20]. However, some studies consider such assessments limited since they only identify cognitive decline once symptom presentation has begun [20]. Therefore, research is beginning to focus on analysis of additional data modalities including gait analysis [21–23], functional connectomics [24], electroencephalogram [25, 26], amyloid PET [27], FDG-PET [28, 29], and quantitative susceptibility mapping [30–34]. However, diagnosis is still reliant on clinical features and standardised clinical criteria including the UK Parkinson’s Disease Society Brain Bank (UKPDSBB) [35]. Such criteria rely largely on expertise and knowledge of a neurologist, however they can still be unreliable, with diagnostic accuracy assumed to be just over 80% in specialised neurology centres [36].

Machine learning (ML) techniques are increasingly being used within the healthcare industry for a wide range of tasks. Publications using ML for detection and diagnosis of CI have increased to investigate the potential uses for these techniques in attempt to mitigate these limitations and provide additional measures that may potentially identify CI in a quicker and earlier manner. Such techniques enable systems to learn by example by studying large datasets and extracting meaningful representations [37] which are then used to make decisions based on learned information without explicit programming [38]. Using ML to analyse CI in PD is yet to be fully analysed and validated despite this growing usage and expansion in literature. This work aims to provide an overview of published studies applying ML to detecting and diagnosing CI, evaluate the feasibility of implemented methods, their impacts, and provide suitable recommendations for methods, modalities and outcomes. We aim to provide an overview that functions as a starting point for further, more detailed analyses of CI in PD, influencing research into identification of early stage, non-invasive markers of disease progression. This has the potential to allow for detection and diagnosis at the earliest stages, allowing much needed intervention and preventative care that could slow overall disease progression [39, 40].

The paper is structured into a number of separate sections. Firstly, a background provides context on PD, CI, and ML’s role in detection and diagnosis. Review protocol outlines the systematic literature review process. Observations and findings analyses literature characteristics and explores ML’s application in CI detection, emphasising performance across data modalities. Discussion considers the implications of findings within the context of PD and CI, fostering critical analysis and integration of ML insights into clinical practice. Finally, this paper is concluded in conclusions.

Background

Parkinson’s Disease

PD is a progressive neurodegenerative disorder primarily affecting the motor system [1], characterised by the extensive loss of dopaminergic neurons in the substantia nigra pars compacta [41] alongside pathological processes, including aggregation of α-synuclein protein [42], mitochondrial dysfunction [43], oxidative stress [44] and neuroinflammation [45]. This region of the brain is integral in control of motor functions, and deterioration of this region results in the hallmark symptoms of PD. As PD progresses and develops, up to 50% of these crucial neurons are lost at the point of symptom presentation, significantly reducing dopamine levels. This depletion of dopamine results in a primary manifestation of pronounced motor symptoms including resting tremors, bradykinesia, muscle rigidity and postural instability [6], underscoring the fundamental nature of PD as a motor disorder.

Despite PD being a fundamentally motor disease, it also encompasses a wide variety of non-motor symptoms that can cause impacts to a patient’s quality of life including anosmia (loss of smell) [46], sialorrhea (excessive salivation), difficulties with speech and swallowing [47] alongside changes in vision and hearing [48, 49]. Additionally, dopamine is also involved in the regulation of cognitive processes [50, 51], resulting in PD affecting executive functions, attention, visuospatial and language skills. These cognitive changes typically become more pronounced as the disease advances and are crucial for a comprehensive understanding of the disease alongside traditional motor symptoms.

Cognitive impairment.

Most notably, CI has begun to show prominence as a non-motor symptom of PD [10]. Alongside dopamine depletion, contributions to cognitive dysfunction are made by accumulation of Lewy bodies and Lewy neurites in the cerebral cortex, limbic system and other brain areas [52, 53]. Similarly, cholinergic [54], serotonergic [55], and noradrenergic [56] systems have been implicated for involvement in development of CI in PD.

CI is a complex condition causing profound impacts on daily life and well-being [10]. CI encompasses a wide range of cognitive deficits extending beyond the expectation of memory impairment, including executive dysfunction [57], attention [58], visuospatial impairment [59], language difficulties [60], memory problems [57, 61], and mood disturbances such as anxiety and depression. Occurrence of CI is heterogeneous, with variations in patterns and severity. Differentiating between levels of severity [62], including Normal Cognition (PD-NC), Mild Cognitive Impairment (PD-MCI), and PD-Dementia (PDD) is crucial, with the latter constituting a severe and pervasive deficit characterised by significant impairments to daily functioning [7].

Diagnosis involves a comprehensive assessment battery, integrating elements including clinical evaluations, neuroimaging, and neuropsychological testing. A key framework for this is the Movement Disorder Society Criteria (MDS), which provides thorough evaluation of cognitive domains including attention, memory, and executive functioning [62]. A number of assessment and screening tools can be used, including the Mini-Mental State Examination (MMSE), which assesses general cognitive ability [63], and the MoCA, offering a more detailed evaluation of cognitive function [20]. The MoCA is generally considered to have a higher diagnostic power compared to the MMSE due to its broader cognitive domains used in assessment, heightened sensitivity to MCI and early dementia, and better adjustment for education levels, making it a superior tool in detecting subtle cognitive changes and must be used for cognitive screening of PD in clinical practice. Neuroimaging techniques play a considerable role through identification of structural and functional changes in the brain associated with continuing cognitive decline [64]. Such tools allow clinicians and researchers to use standardised, comprehensive approaches to the diagnosis of CI in PD, potentially allowing for earlier detection and management of this aspect of the overall PD condition.

Machine learning

ML has emerged as a valuable tool in a variety of healthcare applications, including the detection of PD [65, 66] and a number of related memory disorders [67, 68], proving itself worthy of consideration for analysing CI. ML encompasses various approaches including supervised, unsupervised, deep and ensemble learning. Therefore, the following section provides an overview of the required theory to understand the wide array of ML methods that have been implemented in discovered studies, with the applications of these techniques discussed further on.

Supervised learning.

Supervised learning involves algorithms trained to make decisions based on a set of labelled examples. A dataset of input data and their expected output labels are used to train the model [69] with internal parameters adjusting to minimise differences between predictions and expected labels. The trained models then generalise knowledge to make predictions on new data to solve either regression or classification tasks. Supervised learning methods cover a large wealth of model types encompassing both classification and regression tasks.

Classification methods are fundamental ML methods that are used to assign categorical labels to data points based on their provided input features [70]. These methods are vital in their ability to differentiate and categorise provided data into distinct classes, enabling models to recognise patterns and perform decision making [71]. Classification models perform a variety of approaches, from simple binary classifiers to more complex multi-class systems, each of which are designed to address a specific type of classification problem.

Regression methods are essential tools in ML that allow for the prediction of continuous numerous variables based on a pre-determined set of input features [72]. These methods are widely utilised due to their ability to effectively model relationships between variables, understand patterns in data and make accurate predictions [73]. Regression models vary widely in terms of technique, and therefore are typically tailored for use in a specific scenario. Tree based ML methods cover a class of algorithms used widely for classification and regression tasks [74] including Decision Trees (DT), Random Forests (RF) and Gradient Boosting Trees (GBT). Favoured due to their simplicity, interpretability and proficiency in handling structured data [75]. At their core, these methods work by recursively splitting datasets into subsets based on input features, creating a hierarchical tree like structure, with nodes representing a particular rule and branches denoting outcomes.

Support Vector Machines (SVMs) are a supervised ML algorithm typically used for classification or regression. SVMs excel at binary classification tasks but also have the potential for adaptation for multi-class classification and regression [71]. The principle concept of SVMs relies on the finding of a ‘hyperplane’, or decision boundary, that is able to effectively separate the different classes of data points. This decision boundary is positioned to maximise the distance between the decision boundary and the closest data points of each respective class or ‘support vectors’ to ensure effective categorisation [76].

Linear Regression (LiR) methods assume the presence of a linear relationship between input features and an expected target variable, and therefore are more suited for standard linear relationships [72]. Logistic Regression (LoR) is used to model the probability of a binary outcome, using an ‘S’-shaped logistic function that maps any real-valued number into the range 0 to 1, making it particularly suitable for binary classification tasks [77]. Polynomial Regression (PR) extends this ability by allowing for the analysis of features with a polynomial relationship to the target variable [78].

Neighbour-based methods are a group of ML techniques that rely on the concept of similar data points sharing common characteristics, allowing predictions or recommendations to be made based on the proximity of data points to one another. The most commonly utilised neighbour method is that of the K-Nearest Neighbour (K-NN) method. The K-NN algorithm is focused on the concept of finding the K-nearest data points to a given target point. It then makes predictions based on the majority class or the average value of these K-nearest neighbours [79] respectively for classification and regression tasks.

Naïve Bayes (NB) is a ML algorithm primarily used for classification tasks with predefined classes or categories. The algorithm operates on the basis of Bayes theorem, in which the probability of an item belonging to a specific class is calculated based on the observed features [70]. ‘Naïve’ identifies an assumption made during the modelling process, in which all features used for classification are assumed to be independent of each other in producing the class label [80].

Discriminant Analysis (DA) is a fundamental technique in ML aimed at simplifying complex datasets by reducing the number of features or variables involved whilst retaining crucial information [81]. This approach is vital for addressing challenges associated with high-dimensional data. Notable supervised methods for discriminant analysis are Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA).

Genetic Programming (GP) is a ML approach inspired by the mechanisms of natural selection and evolution [82]. At its core, GP emulates the process of biological evolution to automatically create and refine computer programs to tackle complex problems. GP processes begin with a population of randomly generated computer programs, often represented as trees or graphs [83].

Over multiple generations, GP continues to evolve programs, gradually improving their ability to solve the problem and moving closer to optimal or near-optimal solutions [82].

Hybrid ML methods combine the strengths of multiple ML techniques to address complex and diverse problem domains more effectively. These methods often integrate both traditional statistical approaches and modern deep learning algorithms [84]. Hybrid models are particularly valuable when dealing with multifaceted data types or when a single ML technique may not capture all the nuances of a problem.

Unsupervised learning.

Unsupervised learning is an alternative method of ML technique in which algorithms find patterns, rules or structures in unlabelled data. No explicit labels are provided alongside training data. Instead, algorithms are expected to uncover all relationships, groupings and representations within the data to group data into a set of categories or ‘clusters’ based on common features and patterns [85]. Common unsupervised techniques include clustering, in which algorithms group items together based on similar data points, and dimensionality reduction, which aims to simplify complex data by representing it in a lower dimensional space.

Dimensionality Reduction (DR) is fundamental in ML, and involves reducing the number of input features while maintaining important information to improve the efficiency and effectiveness of algorithms in handling high-dimensional data [86]. Notable unsupervised methods for DR are Principal Component Analysis (PCA), and Non-Negative Matrix Factorisation (NMF). PCA is a technique used for understanding the structure of high-dimensional data by reducing data dimensions without the loss of significant information by focusing on capturing the maximum variance present in data by identifying combinations of features called principal components [87] which often reveal underlying patterns in the data. NMF is a technique that decomposes a given data matrix into two or more matrices, where all the numbers in these matrices are non-negative. These matrices capture underlying patterns and relationships within the data, allowing us to represent the original data as a combination of these patterns, which can be easier to interpret and analyse [88].

Clustering is an essential unsupervised ML technique that groups data points based on their inherent similarities, revealing hidden structures within data [89]. Two prominent clustering methods are K-Means Clustering (KMC) and Gaussian Mixture Models (GMM). KMC aims to partition data into K clusters, where each data point is assigned to the cluster with the nearest mean (centroid). K-Means is computationally efficient and suitable for scenarios with roughly spherical and equally sized clusters [90, 91]. GMM, on the other hand, models data as a mixture of multiple Gaussian distributions, offering greater flexibility in handling clusters with varying shapes, sizes, and densities. It employs the Expectation-Maximisation (EM) algorithm to iteratively optimise its parameters [92].

Deep learning.

Deep Learning (DL) is a specialised subset of ML focused on the training of Artificial Neural Networks (ANNs), which are models inspired by the structure of the brain [93]. DL differs significantly from traditional ML methodologies by utilising ‘Deep’ Neural Networks (DNNs), which are characterised by an interconnected, layered network architecture. The term ‘deep’ stems from the advanced capability of the network to automatically extract and learn features from raw data, bypassing the requirement for traditional, pre-defined, non-trainable feature extractor blocks [94]. This direct extraction of hierarchically organised, trainable features enables these models to perform complex pattern recognition and decision-making processes in a more effective manner. Each layer within a DNN is comprised of a number of interconnected nodes or neurones, that are capable of sequentially processing and transforming data, creating a hierarchical, structured representation of the input, significantly enhancing the models ability to learn from vast amounts of data and make informed predictions and decisions. Convolutional Neural Networks (CNNs) represent a specialised form of DNNs designed for processing grid-like data, such as images and videos [95]. CNNs have significantly advanced computer vision tasks by employing convolutional layers to apply filters (kernels) to input data, capturing local patterns. Pooling layers reduce spatial dimensions, and fully connected layers facilitate classification or regression [96]. CNNs dominate fields like image classification, object detection, facial recognition, and image generation [29]. However, it comes with challenges such as the need for datasets containing large numbers of labelled samples, with common deep learning datasets such as the ImageNet dataset including over 3.2 million samples [97], the risk of overfitting in deep networks, and interpretability concerns in complex models.

Ensemble learning.

Ensemble learning is a ML technique focusing on combining multiple individual models to produce a more coherent and formidable ‘ensemble’ model [98]. Such techniques are based on the underlying idea that by aggregating predictions or decisions from a wide variety of models, the overall decision making performance can be considerably improved compared to using a singular model and any minor issues in model architectures can be mitigated [99]. Ensemble learning is applied in a wide range of ML tasks, including classification, regression and anomaly detection. These techniques are increasingly valuable when faced with datasets that are complex or noisy, as the diversity of the models allows for the improving of overall performance despite this noise. Similarly, with ML models consistently impacted by the effects of issues such as overfitting when a model learns the training data too well and struggles to perform on fresh data, ensemble methods can mitigate the impact overfitting may have on a model by improving the overall resilience and generalisation of the model [100].

Model choice.

ML techniques are becoming widely used in healthcare, as detailed above, and therefore have considerable potential for use in detecting CI in PD. However, considerations still need to be made to implement such techniques. Choosing a suitable model depends largely on the nature of the data, problem complexity, and available resources [37]. DL models show great promise for use due to their ability in capturing intricate patterns in larger datasets [101]. However, traditional models including K-NN, SVM and RF still have potential to be effective in a number of situations, particularly when the presence of labelled data is limited and data interpretation is crucial [102]. An overview of all ML models used in the discovered papers are shown in Table 1.

Download:

Table 1. Commonly used ML applications for CI detection and diagnosis in PD.

https://doi.org/10.1371/journal.pone.0303644.t001

Review protocol

No registered protocol exists for this review.

Purpose of review

The aim of this systematic review is to determine if ML approaches are effective for detection and diagnosis of CI in PD, and identify key methodologies, algorithms, and performance metrics. Whilst a number of reviews have previously been conducted to determine the feasibility of ML techniques for the detection of PD alone [103–105], systematic reviews into the area of CI detection for PD are limited, with the most recent review found being published in 2022 [106], containing only half the number of papers in this review. Additionally, the areas of ML and AI are a constantly evolving and changing area, with newer and more advanced techniques becoming frequently available, and as a result it is important to keep abreast of all techniques that are currently being used within this area. Therefore this review expands on previous reviews to include the most recent papers in what is a growing and emerging research area. Based on this and following the PICO (Patient/Population, Intervention, Comparison, Outcomes) guidelines for review question formation [107], by conducting this review, we aim to discuss and answer the following question:

In patients with Parkinson’s Disease, how effectively does machine learning-based detection and diagnosis differentiate cognitive impairment from normal cognition?

Search methodology

A number of systematic review methodologies exist, including AMSTAR [108], PICO [107] and Cochrane [109], however, the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology [110, 111] is a well established, widely recognised approach for systematic reviews and meta-analyses, and is commonly used in both medical and computational based research since it provides a structured and transparent framework for literature searches, study selection and reporting findings, therefore, this work was based on PRISMA guidelines [110, 111]. Fig 1 provides the search, screening, eligibility and extraction steps carried out in this review.

Download:

Fig 1. PRISMA flow diagram of the literature search, screening and extraction procedures for inclusion.

https://doi.org/10.1371/journal.pone.0303644.g001

Literature sources

Whilst this review is focused on ML techniques for a particular research area, it must be considered how this research area is one of a largely medical nature rather than solely computational. Therefore, there is a need to consider sources from both medical and computational viewpoints whilst simultaneously considering sources of a generic nature that may cover any missed topic areas. In this work, we consider four databases spanning computational, medical and generic scientific groups: PubMed (pubmed.ncbi.nlm.nih.gov), IEEE Xplore (ieeexplore.ieee.org), Scopus (scopus.com), and ScienceDirect (sciencedirect.com).

Search strategy

To retrieve all relevant literature, two sets of search terms were chosen for use in searching the aforementioned databases, including a primary set of terms: (1) Parkinson, (2) cognitive impairment, (3) machine learning, (4) deep learning, (5) diagnosis, (6) detection, (7) classification, and (8) identification. A set of secondary terms were also used interchangeably with primary keywords (1)-(3) to discover additional results. These search terms were combined with Boolean operators to produce search strings tailored to each database listed in S1 File. These search strings were then varied accordingly utilising the secondary search terms listed in S2 File. A comprehensive literature search was conducted on the PubMed, IEEE Xplore, Scopus and ScienceDirect databases with no restrictions and publishing dates from the beginning of the database to February 2024, with a search conducted on the 25th February 2024, resulting in a total of 1,052 available results.

Inclusion and exclusion criteria

Based on the review objectives above, inclusion and exclusion criteria were created to ensure all literature align with these objectives and provide an effective overview of the scope of the research area. Therefore, for inclusion in this review, studies need to satisfy at least one of the following criteria:

(a) Classification of Cognitive Impairment (PD-CI), Mild Cognitive Impairment (PD-MCI), and Parkinson’s Disease Dementia (PDD) from Normal Cognition (PD-NC)
(b) Classification of PD-CI, PD-MCI and PDD from other memory-based disorders (e.g. Alzheimer’s Disease (AD) or Dementia with Lewy Bodies (DLB))
(c) Prediction of conversion from PD-CI/PD-MCI to PDD
(d) Prediction of future cognitive assessment scores
(e) Identification of biomarkers for the development of CI in PD sufferers

Studies that met any of the following exclusion criteria were not chosen for inclusion in this review:

(a) Studies investigating CI present before the onset of PD symptoms
(b) Studies not conducted on human participants or secondary data gathered from humans
(c) Studies focusing on the analysis of symptoms that do not include cognitive symptoms
(d) Studies providing a limited or insufficient description of data modalities, subjects or ML methods utilised
(e) Studies conducted in a language other than English

Data extraction

Each paper gathered from sources mentioned in Literature Sources had identical information extracted, with this information included in S3 File in the supplementary material:

(a) Publication Year
(b) Data Source
(c) Activity Type (diagnosis, differential diagnosis, prediction, biomarker identification)
(d) Data Modality
(e) Number of Subjects
(f) Machine Learning Method(s)
(g) Validation Strategies
(h) Associated Outcome(s)

A full description of all performances according to each data modality can be found in S4 File.

Study activities

To ensure all studies are categorised based on their different strategies and goals, each study was analysed based on study objectives into their identified activity:

(a) Diagnosis or detection of CI in PD (Comparison of data from PD patients with CI to PD patients with NC)
(b) Differential Diagnosis (Differentiating between PD with CI, and patients with other memory disorders)
(c) Condition progression prediction
(d) Identification of biomarkers for CI in PD

Each activity category can be linked to a type of ML technique needed to conduct the activities. Activities (a) and (b) focus on classification techniques, (c) focuses on prediction techniques, whilst (d) focuses on feature identification techniques.

Study evaluation

Each study was scrutinised to identify its ML techniques, examining how they adapt to the challenge of detecting and diagnosing CI. Attention is given to how these techniques are adjusted to varying data types to determine those methodologies providing the most effective support across data types and activities. Whilst the impact on CI in PD is of increased importance, all studies use ML techniques, and therefore it is important to consider performances achieved by the different methods and their associated outcomes. Therefore, we compare achieved performance of ML techniques through analysis of their varying performance metrics. In studies using multiple ML models for analysis, the ‘associated outcome’ of the study is identified as the highest performing ML method(s) used. In studies encompassing training and validation phases, only validation performance was considered, and in the case that testing and validation are available, only testing performance is used. In studies performing multiple classification tasks, evaluation is centred around classification tasks focusing on distinguishing PD-NC from PD-CI/PD-MCI or PDD. Certain studies prioritise using ML techniques to draw specific conclusions, rather than concentrating on performance metrics. As a result, emphasis is placed on conclusions or findings obtained rather than numerical performance measures.

Assessment of risk of bias

The risk of bias of all included studies was assessed based on the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [112]. This tool examines 4 separate aspects of the study (participants, predictors, outcome, and analysis), with a number of signalling questions under each aspect marked as ‘yes’, ‘no’ or ‘unclear’ contributing to an overall assignment of risk of bias based on the study contents. Any assignment of ‘no’ indicates a high risk of bias and ‘yes’ considered a low risk. Overall risk of bias was considered low when all aspects are low and external validation was present, and considered high if any aspect was considered high, or all were low but no external validation was present.

Observations and findings

Literature review eligibility

Screening of all literature was performed in four stages. Based on the search criteria above, 1,052 publications were retrieved: 43 from PubMed, 252 from IEEE Xplore, 296 from Scopus and 461 from ScienceDirect. All duplicate publications were removed, excluding 32 results and all review papers were removed, excluding 299 results. 643 publications were removed based on title, abstracts and conclusions meeting exclusion criteria, and one publication was unable to be retrieved. 77 full-text publications were then screened for abstracts, methods, and conclusions. Seven further publications were excluded based on the exclusion criteria specified, resulting in 70 full-text articles available for analysis.

Data sources

In 50 of the 70 studies, patient data was collected from recruited participants in one or more centres [21–31, 113–151]. 16 studies used data repositories, with 13 studies using data from the Parkinson’s Progression Markers Initiative (PPMI) [152–165], and three studies using data from the National BioBank of Korea (NBBK) [84, 166–168], whilst four studies made use of data sourced from pre-existing research cohorts [169–172]. The average sample size was 184.72, with the smallest sample size of 17 [144] and the largest sample size of 2482 [131].

Study activities

A number of study activities were utilised including diagnosis, (PD vs Healthy Controls (HC), PDD vs HC, PD-NC vs PD-MCI, PD-MCI vs PDD), differential diagnosis (PD-CI/PD-MCI/PDD vs AD vs DLB), identification of biomarkers for PD detection, and the prediction of future CI states. Most studies focused on diagnostic activities (n = 48) [21–26, 28, 30, 31, 84, 113, 114, 116–119, 121, 122, 124–126, 128, 130, 133–136, 138, 142, 144–146, 148–151, 153–155, 157–161, 164, 166, 171, 172], followed by prediction (n = 12) [29, 129, 131, 132, 137, 140, 141, 152, 156, 162, 163, 170], biomarker identification (n = 6) [27, 120, 123, 143, 147, 169], and differential diagnosis (n = 4) [115, 127, 139, 167].

Data modalities

The most commonly used data modalities were imaging (n = 33) [24, 27–31, 113, 114, 117, 119, 122, 129, 130, 133, 137, 139–143, 147–152, 155, 156, 159–161, 163, 164], clinical characteristics (n = 17) [84, 118, 131, 132, 137, 140, 152, 153, 155, 156, 159, 160, 162–164, 169, 170], EEG (n = 11) [25, 26, 120, 125–128, 135, 136, 146, 151], and neuropsychological profile (n = 10) [115, 118, 132, 134, 142, 157, 158, 162, 167, 171], followed by a number of additional modalities. A clear overview of the population of discovered studies using each modality is found in Fig 2. A number of additional data modalities were only used in a singular study (n = 7) [116, 153, 162, 164, 170–172], with these being: eye movement, family history, environmental factors, Intelligence Quotient (IQ) & Emotional Intelligence Quotient (EIQ), biofluid assays, electronic health records, and smartphone test scores. Therefore, these remaining studies are grouped into a singular category of ‘other’. A commonly identified theme in most reviewed studies is that the use of a singular data modality as a predictive feature is rare, but instead as part of a combination with other modalities. Therefore, discussions of data modality usage and outcomes focuses on all studies using a particular data modality, even when in combination with others.

Download:

Fig 2. Usage of data modalities across reviewed studies.

https://doi.org/10.1371/journal.pone.0303644.g002

Machine learning techniques

ML techniques used across all reviewed studies were categorised into 12 categories, some of which overlap: (1) tree based methods (n = 32) [22, 23, 26, 27, 31, 114, 116, 119, 122, 125, 126, 128–130, 132, 134, 137, 139, 141, 145, 153, 155, 157–159, 162, 164, 166, 167, 170–172], (2) Support Vector Machines (n = 30) [21, 23–25, 27, 28, 30, 113, 115, 117, 118, 122–124, 133, 134, 138, 139, 141, 142, 148–151, 153, 156–159, 161, 172], (3) ensemble methods (n = 30) [22, 23, 26, 31, 114–116, 119, 122, 125, 126, 128–130, 132, 134, 139, 141, 145, 153, 155, 158, 159, 162, 164, 166, 167, 170–172], (4) regression based methods (n = 15) [30, 113, 115, 131, 134, 140, 152–154, 158, 162, 163, 167, 169, 172], (5) ANNs (n = 13) [29, 121, 122, 135, 136, 138, 139, 143, 147, 155, 157, 160, 162], (6) neighbour based methods (n = 12) [22, 23, 25, 27, 113, 115, 122, 127, 134, 155–157], (7) NB (n = 8) [23, 115, 133, 134, 139, 156, 157, 166], (8) DA (n = 4) [134, 146, 156, 166], (9) DR (n = 3) [21, 118, 120], (10) hybrid methods (n = 1) [84], (11) GP (n = 1) [138], and (12) clustering (n = 1) [144], with most studies using at least two categories of ML model. A clear overview of the population of discovered studies using each ML technique is found in Fig 3.

Download:

Fig 3. Usage of ML techniques across reviewed studies.

https://doi.org/10.1371/journal.pone.0303644.g003

As was stated above, all studies discussed in this review can be categorised based on the different study activity that they implemented, including either classification, prediction, or feature identification. Therefore, regarding usage of these study activities, and the ML techniques they use, the following sections discuss how each of these activities and ML learning types are employed across the studies that have been discovered, with considerations and descriptions given to the most notable studies. However, not all of these categories are populated, with some techniques not present for a particular learning type, and therefore discussions are made accordingly.

Performance metrics

A considerable number of metrics have been used for the assessment of ML performance, as shown in Fig 4. The most commonly used metric was accuracy (n = 46), used both as a sole performance metric (n = 6) [25, 121, 123, 154, 155, 166] and as part of a combination with other metrics (n = 40). In studies using a combination of metrics, the most common combination was accuracy, sensitivity, and specificity (n = 17) [23, 113, 114, 116, 117, 124–126, 130, 133, 142, 149, 150, 158, 164, 167, 171], alongside accuracy, sensitivity, specificity, and Area under the ROC Curve (AUC) (n = 13) [22, 27, 31, 115, 119, 127, 135–137, 148, 151, 153, 161].

Download:

Fig 4. Usage of performance metrics across reviewed studies.

https://doi.org/10.1371/journal.pone.0303644.g004

Risk of bias assessment

An overall assessment of the risk of bias for all studies using PROBAST can be seen in Fig 5. For all included studies, 91.42% were deemed to be of a high risk of bias, and the remaining 9% having an overall low risk of bias. In the participant domain, 63 studies were judged to be at a low risk and seven judged to be at a high risk due to limited description of study data sources or inclusion and exclusion criteria. For the predictor domain, 69 studies were marked as low risk of bias and one marked as high risk. In the outcome domain, 60 were marked as low risk and 10 marked as high risk due to factors such as the inclusion of predictors in the assignment of outcomes. For the analysis domain, 52 studies were marked as low risk of bias and 18 marked as high risk largely due to an insufficient or largely imbalanced number of participants. Overall, a large proportion of studies were rated as high risk of bias, but for the majority of such studies, this assignment of high risk is due to the lack of external validation of the study.

Download:

Fig 5. Overall risk of bias assessment of studies using PROBAST.

https://doi.org/10.1371/journal.pone.0303644.g005

ML techniques applied in CI detection and diagnosis

All reviewed studies have employed supervised, unsupervised or deep learning techniques, with ensemble learning techniques falling under these banners. Overall, most studies discussed used supervised learning techniques in some form, with 58 studies utilising only non-DL supervised learning, two studies using unsupervised techniques alone, eight using DL techniques alone and two studies using a combination of supervised and unsupervised techniques. Since some studies used a combination of techniques from different categories for comparison, these studies are therefore grouped according to the supervision category held by the majority of methods or the most successful category. Therefore, those studies using combinations of supervised and unsupervised techniques are discussed under unsupervised learning. A complete description of all studies, the data modalities they used and their associated outcomes can be found in S3 File.