Topic modeling and evolutionary trends of China’s language policy: A LDA-ARIMA approach

doi:10.1371/journal.pone.0324644

Fig 1.

The flowchart of the proposed method.

More »

Expand

Fig 2.

Perplexity and coherence of LDA models with different values of K^a.

^a The figure examines the range of topics K from 0 to 30, utilizes a step size of 5 for LDA topic extraction, and evaluates perplexity and coherence on the test dataset.

More »

Expand

Fig 3.

The high-frequency word cloud of China’s language policy.

More »

Expand

Table 1.

Statistical parameters of China’s language policy.

More »

Expand

Table 2.

Terms and their relevance for theme analysis^a.

More »

Expand

Fig 4.

Theme intensity evolution analysis^a.

^a T1 denotes the year 2016, while T2 signifies 2021. These specific time points were selected due to the first phase of the language resources protection project in China in 2016. Moreover, 2021 marks the subsequent phase in the development of the language resources protection project.

More »

Expand

Table 3.

Time series ADF test results^a.

More »

Expand

Table 4.

ARIMA model parameter estimation and Ljung-Box test results.

More »

Expand

Fig 5.

Results of the trend prediction of China’s language policy^a.

^a The UCL and LCL confidence levels are obtained by extrapolating the sample data, where UCL denotes the high limit of the confidence interval and LCL denotes the low limit of the confidence interval.

More »

Expand

Table 5.

ARIMA model predictive assessment.

More »

Expand