Fig 1.
The flowchart of the proposed method.
Fig 2.
Perplexity and coherence of LDA models with different values of Ka.
a The figure examines the range of topics K from 0 to 30, utilizes a step size of 5 for LDA topic extraction, and evaluates perplexity and coherence on the test dataset.
Fig 3.
The high-frequency word cloud of China’s language policy.
Table 1.
Statistical parameters of China’s language policy.
Table 2.
Terms and their relevance for theme analysisa.
Fig 4.
Theme intensity evolution analysisa.
a T1 denotes the year 2016, while T2 signifies 2021. These specific time points were selected due to the first phase of the language resources protection project in China in 2016. Moreover, 2021 marks the subsequent phase in the development of the language resources protection project.
Table 3.
Time series ADF test resultsa.
Table 4.
ARIMA model parameter estimation and Ljung-Box test results.
Fig 5.
Results of the trend prediction of China’s language policya.
a The UCL and LCL confidence levels are obtained by extrapolating the sample data, where UCL denotes the high limit of the confidence interval and LCL denotes the low limit of the confidence interval.
Table 5.
ARIMA model predictive assessment.