Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec

doi:10.1371/journal.pone.0305095

Fig 1.

Flowchart of the framework of text classification algorithm for subordinate classes of tourist attractions.

More »

Expand

Fig 2.

CBOW and skip-gram models [36].

More »

Expand

Fig 3.

Structure of skip-gram model [37].

More »

Expand

Table 1.

Binary contingency table.

More »

Expand

Table 2.

Evaluation indicators -1 for classification results.

More »

Expand

Table 3.

Evaluation indicators -2 for classification results.

More »

Expand

Fig 4.

Frequency of occurrence of different text lengths.

More »

Expand

Table 4.

Distribution of experimental dataset categories.

More »

Expand

Table 5.

Hyperparameter settings for Word2Vec and Doc2vec.

More »

Expand

Table 6.

Hyperparameter settings of classification model.

More »

Expand

Table 7.

Hyperparameter settings of BERT model.

More »

Expand

Table 8.

Classification performance of the entire test set in the MLP classifier during the improved processes.

More »

Expand

Table 9.

Classification performance of each category of the test set in MLP during the improved processes.

More »

Expand

Table 10.

Classification performance of different combinations of text representation method & classifier.

More »

Expand

Fig 5.

Difference line graph of "micro-F1 minus weighted-F1".

More »

Expand

Fig 6.

Difference line graph of "weighted-F1 minus macro-F1".

More »

Expand

Fig 7.

Weighted-F1 values for different combinations of text representations & classifiers.

More »

Expand

Fig 8.

Values of each evaluation index under the optimal classification combination model.

More »

Expand

Fig 9.

Classification results of each category in the optimal combination of different text representations & classifiers.

More »

Expand

Fig 10.

F1-measure of each category in the optimal combination of different text representations & classifiers.

More »

Expand

Fig 11.

Weighted-F1 of the optimal combination model with different scale text sets.

More »

Expand

Table 11.

The optimal combination of text representations and classifiers for different-size text sets.

More »

Expand

Fig 12.

Confusion matrix heat map of optimum classification combination models under text set size of 3498.

More »

Expand

Fig 13.

F1-measure of the composite category of optimum classification combination models under different scale text sets.

More »

Expand

Fig 14.

Quantity ratio of the true and predicted values for each category of tourist attractions.

More »

Expand

Fig 15.

Confusion matrix heat map of the test set in Shanghai and Hunan Province.

More »

Expand

Table 12.

Comparing true and predicted values of the top 2–3 categories in different level attractions.

More »

Expand

Table 13.

Quantity ratio of the true and predicted values of various attractions in the two provinces.

More »

Expand

Table 14.

Comparing true-predicted values of the top 2–3 categories in different-level attractions in the two provinces.

More »

Expand