Original SubmissionFebruary 4, 2020
Decision Letter - Marco Bonizzoni, Editor


LDA Filter: A Latent Dirichlet Allocation preprocess method for Weka


The authors should particularly strive to address the following issues raised in review:

  • Can the authors propose an explanation for the fact that, for all datasets used, the LDA method systematically only improved the results of the kNN algorithm, but not SVM and NB?
  • More data sets should be tested (possibly openly accessible ones as well), so the proposed LDA filter can be shown to provide significant improvements for a larger range of datasets.
  • A broader literature survey should help to contextualize the problem statement, and should inform a discussion of the reasons why LDA improves text classification performance. 
  • In addition to source code, the authors should also share compile and install instructions to ensure reproducibility of the work described here, as well as documentation for the use of their plugin.

Reviewer #1: * What are the main claims of the paper and how significant are they for the discipline?

The main objective is to create a filter for Weka, where text data could be transformed in the low dimension representation using LDA and show that the classification tasks using LDA representation are faster without compromising accuracy.

Using LDA for text representation has been in practice for several years now. So, there is no new research contribution. The only contribution from the authors is the creation of an LDA plugin for Weka.

* Are the claims properly placed in the context of the previous literature? Have the authors treated the literature fairly?

No, there are a lot of papers in the literature that uses LDA for information retrieval, search engines, text matching, text hashing, etc. The authors have just cited the base paper by Blei and the semi-supervised extension on LDA modeling. The literature survey is insufficient in the context of the problem statement.

* Do the data and analyses fully support the claims? If not, what other evidence is required?

The results of the experiments are presented in the paper. The authors have used LDA (by calling an API from the MALLET library) to build a filter for Weka. As per their own experimental results, the filter appears to be not useful for improved classification accuracy. In all the 3 datasets used in the experiments, the LDA method worked for just the kNN algorithm, but no reasoning provided. Also, there is no explanation provided for why the method didn't work for other algorithms (SVM and NB). Using just 3 datasets for the experiments seem insufficient to prove anything empirically. The authors claim "speed" improvement as a positive outcome, but it is not interesting as with any dimensional reduction technique speed improvement is obvious.

Yes, creation of an LDA filter for Weka is an useful contribution, but the authors should improve the LDA method to make the filter help in improving the classification accuracy. Some of the suggested amendments are:

+ More data sets to be tested.

+ LDA filter should be shown providing accuracy improvement for the majority of the datasets.

+ Thorough literature survey should be done to find cues for how to LDA for improving text classification performance.

+ The LDA tuning process can become costly if a grid search for parameters is done. So, a method for smart tuning should be suggested.

+ Source code is made available, but the preprocesed dataset and results are not available in public domain. Sufficient documentation of the source code should be provided with compile and install instructions.

No. Data is not made available. The source code is made available in Github, but there are no instructions for compilation and testing. There is no documentation available for how to tune/use the plugin. The plugin is made available in Sourceforge, but no documentation either.

* Are details of the methodology sufficient to allow the experiments to be reproduced?

Yes, if we the use the plugin prebuilt for Weka (

If we just follow the paper, it is not possible to reproduce the experiment.

* Is the manuscript well organized and written clearly enough to be accessible to non-specialists?

The paper is written like a technical report and not like a research article.


Decision Letter - Marco Bonizzoni, Editor

LDA Filter: A Latent Dirichlet Allocation preprocess method for Weka


LDA Filter: A Latent Dirichlet Allocation preprocess method for Weka

