Peer Review History

Original Submission April 17, 2022
Decision Letter - Ali Safaa Sadiq, Editor

A novel customer churn prediction model for the telecommunication industry using data transformation methods and feature selection

Dear Dr. Rahman,

Thank you for submitting your manuscript to PLOS ONE.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: Partly


Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: I Don't Know


Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes


Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes


Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper proposed a prediction model for the telecommunication industry using data transformation methods and feature selection; however, the article should be revised as follows:

1.English writing is good but can be improved by a native.

2.The abstract should be re-written, and principal research gaps and contributions are unclear.

3. Although the introduction is well-organised, the existing research gaps were not properly discussed and listed in the introduction section.

4. The work's main contributions and novelty can be re-written and focus mostly on the novelties.

5. In the Tables, the best-found results can be bold.

6. Please develop the section of the related works separately, and develop the current literature review using some references about the hyper-parameters tunning of deep learning model such as: a) A deep learning-based evolutionary model for short-term wind speed forecasting: A case study of the Lillgrund offshore wind farm. Energy Conversion and Management, 236, 114002. b) Short-term wind speed forecasting using recurrent neural networks with error correction, Energy, Volume 217, 2021, 119397. c) LSTM based long-term energy consumption prediction with periodicity, Energy, Volume 197, 2020. d) Prediction of electricity generation from a combined cycle power plant based on a stacking ensemble and its hyperparameter optimization with a grid-search method, Energy, Volume 227, 2021.

7. what are the benefits and drawbacks of grid search method? please add them.

8. The applied and optimised hyper-parameters should be listed in a table.

9. If it is possible provide a 3-D plot of the grid search performance for hyper-parameters tuning

Reviewer #2: 1) Customer churn prediction model for telecom using machine learning technique is not a new concept. Hence, it is not convinced that the model is novel and novelty of the model needs to be well demonstrated.

2) The organization of the manuscript should be mentioned in Introduction.

3) The literature review should emphasize both the findings and limitation. It is better to produce a comparative table.

4) The optimization of the machine learning classifiers is not well demonstrated.

5) Authors should provide more precise and critical comparison on existing related works. Need to provide more details on what is the research gap in the existing machine learning model and what are the possible ways to improve those.

6) Please revise all of the English. It is very important that the manuscript is finally revised by a native speaker.

Reviewer #3: Article Title:

A novel customer churn prediction model for the telecommunication industry using data transformation methods and feature selection

Manuscript: PONE-D-22-11288

Reviewer's Comments:

In this article, the authors have conducted a comparative study on various data transformation methods (RAW, Log, Box-cox, Rank, Discretization, Z-score, WOE) followed by feature selection (univariate feature selection, etc.). Further, they have performed hyperparameter tuning for various machine learning methods (KNN, NB, LR, RF, DTree, GB, FNN, and RNN). However, the article's contents in its current state led to an emphasis on work presentation and some technical issues listed below.

• The abstract section is very generalized and cannot reveal the clear outcomes of the proposed study.

• The literature is outdated because there is a need to cite articles from 2019 and onward.

• The proposed study may also clearly distinguish the work presented in this article from existing work

• The authors have used three datasets and provided the following source links:

1. (Last Access: September 29, 2019)

2. (Last Access: February 10, 2020)

3. (Last Access: February 10, 2020)

However, I have observed that URL-2 and URL-3 are the same datasets. The number of samples of both datasets is different. One dataset contains 3333 samples, and the second dataset has 5000 samples. I will recommend to considered different datasets.

• I have a few observations on figure-1, which is the proposed flowchart of the optimized CCP model:

1. What is done during the preprocessing step may also be illustrated in the preprocessing block?

2. Why specifically used univariate feature selection?

3. Why straightaway terminate the process after 10-fold validation? I think 10-fold validation will produce some results which may be calculated using evaluation measures and will be used for comparison of Machine learning methods.

4. It would be more appropriate if you could add a statistical test or significance test, which is currently missing.

Reviewer #4: The contribution of this paper is good and I am happy to endorse its acceptance at some point. However, there are several major and minor comments to address. I have listed them as follows:

Please clearly state the gap targeted in this paper at the end of introduction and list down the hypotheses. In terms of research method and design, there is not much in the paper. The comparative algorithms in the experiments are not properly acknowledged and cited. I also suggest adding some figures to better articular the content as the paper looks very dry at the moment. Analysis of the results is missing in the paper. There is a big gap between the results and conclusion. There should be the result analysis between these two sections. After comparing the numerical methods, you have to be able to analyse the results and relate them to their structures. It would be interesting to have your thoughts on why the method works that way? Such analyses would be the core of your work where you prove your understanding of the reason behind the results. You can also link the findings to the hypotheses of the paper. Long story short, this paper requires a very deep analysis from different perspectives. There is no statistical test to judge about the significance of the numerical method’s results. Without such a statistical test, the conclusion cannot be supported. There is no discussion on the cost effectiveness of the proposed method. What is the computational complexity? What is the runtime? Please include such discussions. You can also use the big oh notation to show the computation complexity. Some mathematical notations and Lemma presentations are not rigorous enough to correctly understand the contents of the paper. The authors are requested to recheck all the definition of variables and further clarify these equations.


Reviewer #1: No

Reviewer #2: Yes: Debashish Das

Reviewer #3: Yes: Adnan Amin

Reviewer #4: No

Revision 1

Decision Letter - Ali Safaa Sadiq, Editor

A novel customer churn prediction model for the telecommunication industry using data transformation methods and feature selection

Dear Dr. Rahman,

Thank you for submitting your manuscript to PLOS ONE.

Reviewer #1: All comments have been addressed

Reviewer #4: (No Response)


Reviewer #1: Yes

Reviewer #4: (No Response)


Reviewer #1: Yes

Reviewer #4: (No Response)


Reviewer #1: Yes

Reviewer #4: (No Response)


Reviewer #1: Yes

Reviewer #4: (No Response)


Reviewer #1: The authors have sufficiently addressed the reviewed issues in the manuscript and this work can be published.

Reviewer #4: Some final cosmetic comments:

* The results of your comparative study should be discussed in-depth and with more insightful comments on the behaviour of your algorithm on various case studies. Discussing results should not mean reading out the tables and figures once again.

* Avoid lumping references as in [x, y] and all other. Instead summarize the main contribution of each referenced paper in a separate sentence. For scientific and research papers, it is not necessary to give several references that say exactly the same. Anyway, that would be strange, since then what is innovative scientific contribution of referenced papers? For each thesis state only one reference.

* Avoid using first person.

* Avoid using abbreviations and acronyms in title, abstract, headings and highlights.

* Please avoid having heading after heading with nothing in between, either merge your headings or provide a small paragraph in between.

* The first time you use an acronym in the text, please write the full name and the acronym in parenthesis. Do not use acronyms in the title, abstract, chapter headings and highlights.

* The results should be further elaborated to show how they could be used for the real applications.


Reviewer #1: No

Reviewer #4: No


Revision 2

Decision Letter - Ali Safaa Sadiq, Editor

A novel customer churn prediction model for the telecommunication industry using data transformation methods and feature selection


Dear Dr. Rahman,

The authors have addressed all the given comments by reviewers, hence I am happy to recommend their paper for the possible publication.

Reviewer #4: (No Response)


Reviewer #4: (No Response)


Reviewer #4: (No Response)


Reviewer #4: (No Response)


Reviewer #4: (No Response)


Reviewer #4: all comments have been addressed. all comments have been addressed. all comments have been addressed. all comments have been addressed.


Reviewer #4: No


Formally Accepted
Acceptance Letter - Ali Safaa Sadiq, Editor


A novel customer churn prediction model for the telecommunication industry using data transformation methods and feature selection

Dear Dr. Rahman:

