Machine learning and deep learning techniques to support clinical diagnosis of arboviral diseases: A systematic review

Background Neglected tropical diseases (NTDs) primarily affect the poorest populations, often living in remote, rural areas, urban slums or conflict zones. Arboviruses are a significant NTD category spread by mosquitoes. Dengue, Chikungunya, and Zika are three arboviruses that affect a large proportion of the population in Latin and South America. The clinical diagnosis of these arboviral diseases is a difficult task due to the concurrent circulation of several arboviruses which present similar symptoms, inaccurate serologic tests resulting from cross-reaction and co-infection with other arboviruses. Objective The goal of this paper is to present evidence on the state of the art of studies investigating the automatic classification of arboviral diseases to support clinical diagnosis based on Machine Learning (ML) and Deep Learning (DL) models. Method We carried out a Systematic Literature Review (SLR) in which Google Scholar was searched to identify key papers on the topic. From an initial 963 records (956 from string-based search and seven from a single backward snowballing procedure), only 15 relevant papers were identified. Results Results show that current research is focused on the binary classification of Dengue, primarily using tree-based ML algorithms. Only one paper was identified using DL. Five papers presented solutions for multi-class problems, covering Dengue (and its variants) and Chikungunya. No papers were identified that investigated models to differentiate between Dengue, Chikungunya, and Zika. Conclusions The use of an efficient clinical decision support system for arboviral diseases can improve the quality of the entire clinical process, thus increasing the accuracy of the diagnosis and the associated treatment. It should help physicians in their decision-making process and, consequently, improve the use of resources and the patient’s quality of life.


Reviewer #1: (No Response)
Reviewer #2: The authors perform exceptional work reviewing literature and explaining ML concepts to non-expert audiences interested in reading the journal (Focused on tropical infectious diseases). I suggest including the explanation of some concepts that are mentioned through the review that may help the reader to understand some points. I.e., Grid search, hyperparameters, Gamma parameters, Model overfitting. Answer: We have explained some of these concepts in sub-section 3.3. --------------------

Conclusions
-Are the conclusions supported by the data presented? -Are the limitations of analysis clearly described? -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? -Is public health relevance addressed?
Reviewer #2: The authors exposed the results obtained and formulated a reflection about needs in the area and how the application of ML would be beneficial to the study and diagnosis of arbovirus diseases. Moreover, they exposed the caveats in the area and the urgent needs to address in future studies. Answer: Thank you very much. --------------------

Editorial and Data Presentation Modifications?
Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend "Minor Revision" or "Accept".

Reviewer #1: (No Response)
Reviewer #2: Due to the nature of the review and the non-expert in ML audience expected in a journal focused on infectious diseases, I suggest integrating concepts to explain the ML methods in a more organised way, first specify that the methods discussed are supervised methods, and explain what is Deep learning. Answer: We added new introductory content about AI, ML, supervised learning and DL in sub-section 3.2 of the revised manuscript.
-3.2.3 Section (Neural Networks) authors start explaining MLP with no introduction and may seem a bit confusing. Answer: We added background text on MLP in sub-section 3.2.3 of the revised manuscript.
-Some abbreviations are missing in the list (AI, XAI, MOMO%), and others are doubly explained (SLR). Answer: We have revised the entire manuscript and added the abbreviations that were missing.

Summary and General Comments
Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.
Reviewer #1: The authors review published studies using machine learning to predict arboviruses from clinical and laboratory data. Overall, they do a nice job of collecting the existing research, are rightfully critical of some of some of the major shortcomings of studies in this field including, overfitting, the selection of evaluation metrics and the poor documentation of code, data and methods. Answer: Thank you for the review and comments. We have read and addressed them carefully.
Tables 3 and 4 provide a very nice summary of the features used in the studies.
In section 3.3 I was pleased to see the authors flagging studies that appear to be overfitting or inappropriately benchmarking their studies as this practice is unfortunately common in biological applications of supervised machine learning. It would be great if they explicitly flagged these studies in table 5. A discussion about the importance of regularization method like batch normalization, L1, L2 and C in SVM would have also been helpful. Many readers may not understand the importance of this for generalization. Answer: We have updated Table 5 and added a discussion about the importance of regularization methods. We have also improved the presentation of basic concepts of ML and DL to support readers who may not be as familiar with ML/DL (please see sub-section 3.3).
Throughout the paper they refer to Machine learning and (ML)and Deep learning (DL) separately. DL is a subset of ML and only one study referred to DL so it would be much clearer to just refer to everything as ML except when specifically talking about the Ho et al. (2020) paper. Answer: We have added new text about AI, ML, supervised learning and DL in sub-section 3.2 of the revised manuscript.
The review spends a lot of time discussing the use of a convolutional neural network (CNN), DenseNet, by Ho et al. (2020). They discuss CNN for a half a page on page 9 and again on page 10 and page 24. That model did not perform appreciably better than a logistic regression or random forest model. DenseNet is unique in that it connects each layer to every other layer in a feed forward neural network and essentially concatenates features rather than summing them. But like other CNN's it relies on pooling and convolutions at different scales. While this makes sense for image data where the proximity of pixels has meaning, it is seemingly meaningless for the 16 features clinical and laboratory features placed in a random order in the Ho et al.
(2020) analysis. It seems that DenseNet sort of just devolves into a multilayer perceptron or similar since the convolutions are meaningless. It would be good to point this out. More generally, In the discussion the authors make the recommendation to invest in CNN, LSTM. The data types used in the model are not spatially or relationally structured and the amount of training data do not lend themselves well to these large models. I have concerns about this recommendation and the use of recurrent models (RNN, CNN, and LSTM) generally on this kind of data. Answer: Thank you for the comment regarding the usage of CNN for solving tabular data problems. We have added a discussion in sub-section 3.3.1 of the revised manuscript about this topic. We also revised the discussion in Section 4.
Our findings suggest that previous works done on classification of arboviral diseases mostly use tabular data and are dominated by tree-based models. To make use of tabular data with DL models, the main challenge is to reshape the data to fit it in the specific input representation. Commonly, datasets are high-dimensional and very sparse, and then the difficulty of reshaping a good input representation is magnified [2]. Another challenge for DL models working with tabular data is related to the scale and distribution of the features present in the dataset. For tree-based models, this aspect hardly matter; on the other hand, DLs are very sensitive to them, and can cause vanishing and exploding gradient problems [2].
Despite these challenges, we can take advantage of powerful features that DL can inherently provide. For instance, by using a DL model, we can reduce the manual feature engineering, which is a time-consuming task commonly used when pre-processing the dataset for ML training and testing, by using convolution and max pooling layers. Some works have proposed the transformation of tabular data to images (matrix)  I appreciated that the authors highlighted the issue of imbalanced datasets in the studies reviewed. I think this could bear even more discussion. It could also be expanded to the larger issue of dataset shift problems like prior probability shift and covariance shift. I think there is likely to be major issues with the ability of these models to generalize well. Answer: We have added a discussion in Section 4 of the revised manuscript about issues relating to imbalanced datasets, the dataset shift problem and generalization.
It would also be worth discussing if the studies point to general limits to the approach. Given the small number of general features it may be that 85% accuracy is about the best that can be expected. Given that this is data predominantly comes from a population where a healthcare professional suspected an arboviral infection and ordered a clinical test, a classifier with this level of accuracy may not perform better than a clinician assessing a patient with symptoms. This would suggest that it would be better to investing resources in developing low-cost clinical diagnostics with the potentially to achieve much higher accuracy.