User review analysis of dating apps based on text mining

With the continuous development of information technology, more and more people have become to use online dating apps, and the trend has been exacerbated by the COVID-19 pandemic in these years. However, there is a phenomenon that most of user reviews of mainstream dating apps are negative. To study this phenomenon, we have used topic model to mine negative reviews of mainstream dating apps, and constructed a two-stage machine learning model using data dimensionality reduction and text classification to classify user reviews of dating apps. The research results show that: firstly, the reasons for the current negative reviews of dating apps are mainly concentrated in the charging mechanism, fake accounts, subscription and advertising push mechanism and matching mechanism in the apps, proposed corresponding improvement suggestions are proposed by us; secondly, using principal component analysis to reduce the dimensionality of the text vector, and then using XGBoost model to learn the low-dimensional data after oversampling, a better classification accuracy of user reviews can be obtained. We hope These findings can help dating apps operators to improve services and achieve sustainable business operations of their apps.


Comments:
Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code and data that underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. New software must comply with the Open Source Definition. If data cannot be openly shared for ethical or legal restrictions, please provide details for accessing these data from the original data holder. Response: The code we used has been publicly posted on GitHub, and the URL of the code (https://github.com/Qian0214Shen/code-of-text-mining-paper) has been filled in the relevant field when submitted. We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." Please include your amended statements within your cover letter; we will change the online submission form on your behalf.. Response: We have removed the section on funding information from the paper.

Comments:
In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.
""Upon re-submitting your revised manuscript, please upload your study's minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.
Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.
We will update your Data Availability statement to reflect the information you provide in your cover letter. Response: We have uploaded the dataset to figshare.com and made it publicly available, referenced the dataset in the Data Acquisition section of the article, and added a statement of compliance with data collection. The URI of our data is: https://figshare.com/articles/dataset/Text_of_user_reviews_of_dating_apps/21895827 6. Comments: PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ' Update my Information ' (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ Response: We have added ORCID information for the corresponding author in her account. 7. Comments: We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 1 in your text; if accepted, production will need this reference to link the reader to the

#Responses to Reviewer 1:
Thank you very much for taking your time to review this manuscript. Please find our itemized responses below. 1. Comments: It's possible that I'll suggest giving this article a presentation at a conference so that You may have an audience and continue the conversation about the issues. My primary justification is on the fact that "users" are the "unit of analysis" linked with this study; hence, there needs to be a concrete subjective evaluation to generate an accurate construct that is suitable for quantifying the impact of "User Reviews." However, while employing machine learning, the assumption is that a new course of action "or knowledge" can emerge from the analysis of the machine learning, sadly the work has not showed any improvement on the problems linked with "User Reviews" on dating apps. Generally speaking, user-based aspects used in this study is faulty. Response: The starting point of our research is based on the perspective of enterprise information management, that is, by mining the user reviews of the apps, analyzing how the operators of the apps can improve the apps based on the opinions of users, and trying to develop a method for the operators of the apps to quickly classify the unmarked user reviews collected. Regarding the impact of quantifying user reviews that you mentioned, this may require a large-scale market research to understand how a large number of users perceive these reviews in Google Play and how these reviews influence user choices. In fact, this is indeed a very interesting and exciting topic, and we hope to have the opportunity to collect more suitable data for such research in further research.
2. Comments: You don't need any sort of conceptualised form of an advanced thought in order to gain something new through "Negative Review Mining." Instead, all you need is to have an open mind. Response: We strongly agree that an open mind is indeed a necessary condition for people to accept bad reviews, but in the current era of big data, it is difficult for app operators to rely only on an open mind to mine information from a large number of reviews. Therefore, we hope to efficiently mine information from massive user reviews by applying machine learning models.
3. Comments: Rating is numeric and reviews are reflections. The numeric values of rating will indicate if DATIN APPS is Good or BAD, whereas reviews will indicate the user's perceptions. Previous research reveals that "Bad ratings are trustworthy regardless of the number of reviews", that is users tend to believe reported bad rating. On the other hand, "Good ratings are trustworthy only when they come along with a high number of reviews" Response: In the current reviews of apps, malicious low-scoring reviews or worthless reviews from bots are always inevitable, and these reviews are difficult to express the general thoughts of app users. There are other comments, perhaps due to their short publication time, that do not receive enough likes, and the value of these comments is difficult to determine in batches. Therefore, in order to control the mining value of the data and ensure that the size of data in the dataset is not too small to affect the fit of the machine learning model, we select comments with more than or equal to 5 likes for analysis.

Comments:
The fourth section is a routine analysis, and it does not introduce anything novel that contributes to the advancement of the research field. The XGBoost and LightGBM models are applied to the dataset that is already in existence without any additional information being connected to them. Response: We should admit that the fourth part of our study is only an application to existing methods. First of all, LightGBM and XGBoost are actually very good machine learning classification models and are widely used in machine learning research in various fields. And in the process of application, we obtained 88.3% good accuracy for machine learning classification tasks with 5 classes, which is close to 19% higher than our baseline model, so we think this result is acceptable.

#Responses to Reviewer 2:
Thank you very much for your guidance and comments on our work. Please find our itemized responses below and our corrections in the re-submitted files.