Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Public attitudes towards dialects: Evidence from 31 Chinese provinces

  • Tianxin Li ,

    Roles Conceptualization, Writing – original draft, Writing – review & editing

    tianxinli0110@163.com

    Affiliation Department of Literature, Shaanxi Normal University, Xi’an, Shaanxi, China

  • Xigang Ke,

    Roles Methodology, Writing – original draft, Writing – review & editing

    Affiliation Department of Literature, Shaanxi Normal University, Xi’an, Shaanxi, China

  • Jin Li

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation International School of Chinese Studies, Shaanxi Normal University, Xi’an, Shaanxi, China

Abstract

Background

Dialect Attitude is conceptualized as an individual’s cognitive and affective evaluation of a dialect and its speakers. In the contemporary China, dialect is suffering from significant stigmatization, resulting in social inequalities, which hinder sustainable development. This study aims to reveal the Chinese public attitudes towards dialects, and clarify the potential determinants related to heterogeneous attitudes at a macro level.

Methods

We combine the crawler technology and sentiment analysis to conduct a provincial cross-sectional study. We collect 1,650,480 microblogs about public attitudes towards dialects from Microblog across 31 specific Chinese provinces. Spatial regression models are utilized to clarify the influence of macro-level determinants on differences in public attitudes.

Results

The present study reveals that: (1) The Chinese public generally holds positive attitudes towards dialects, with significant variation between provinces. (2) Political Resource (β = 0.076, SD = 0.036, P<0.05), Economic Development (β = 0.047, SD = 0.022, P<0.05), and Cultural Resource (β = 0.054, SD = 0.021, P<0.05) promote public positive attitudes towards dialects. (3) Political Resource and Culture Resource influence more significant in the relatively advantaged regions, and Economic Development poses a higher influence in the relatively disadvantaged regions.

Conclusions

Basing on the combination of crawler technology and sentiment analysis, the present study develops the most comprehensive database which takes 1,650,480 dialects-related microblogs from 31 Chinese provinces, and describes the following scenario: (1) Overall, the Chinese public shares a relatively positive attitude towards dialects with significant variations among different provinces, (2) Political Resource, Economic Development and Culture Resource pose positive effects on Chinese public attitudes towards dialects and (3) Political Resource and Culture Resource influence more significant in the relatively disadvantaged regions, and Economic Development poses a higher influence in the relatively advantaged regions.

Introduction

Dialect Attitude (DA) refers to individual’s attitude towards a dialect and its users, which includes three specific emotional polarities, positive, neutral, or negative [1, 2]. According to the latest data from the China General Social Survey (CGSS), China is a multi-dialect country with more than one hundred dialects, and the dialect users have reached to 900 million, accounting for 64.69% of the total Chinese population [3]. Positive DA not only contributes to shaping the unique language variation of a nation, but also promotes social integration and sustainability [4, 5].

However, the desperate truth is that Chinese dialects are suffering from significant stigmatization, especially after the 2000 Chinese economic boom [6]. Chinese dialect users are associated with rude and vulgar, leading to a disadvantaged socioeconomic status and therefore are posed to the bottom of the social ladder [7, 8]. For example, the stigmatization of Andalusian dialects is the result of low socioeconomic status, low educational attainment and social isolation of dialect speakers [9].

Several established studies have shown that the stigmatization of dialects stems from three main factors: (1) language policies leading to divisions within dialect communities; (2) the perception of dialects as hindrances to economic development, and (3) dialects serving as markers of cultural differences that further deepen social divisions [1013]. Unfortunately, the present literature on DA has primarily focused on micro-level factors such as income, education, gender and neglect the potential macro-level determinants correlates to DA [1416]. Meanwhile, the limitation of sample size poses traditional linguistic analysis in a dilemma, which implies the need to combine big data with linguistics. To enrich scientific knowledge, this study collects dialects-related 1,650,480 microblogs from 31 Chinese provinces through crawling technique and sentiment analysis to investigate two crucial topics: (1) revealing the Chinese public attitudes towards dialects and (2) clarifying the potential determinants related to heterogeneous public attitudes.

Theoretical framework and hypothesis development

There are four prominent academic theories beneficial for analyzing dialects, which including Systems Theory, Structuralist Theory, Planned Behavior Theory, and Social Network Theory [1719]. Systems Theory offers a comprehensive framework to analyze DA, which emphasizing that the DA is embedded in the political, economic and cultural state of a given region [20]. Structuralist Theory posits that the use of dialects is not a personal choice, but is also closely linked to macro-level social characteristics [21]. Planned Behavior Theory offers an explanation and prediction of DA by incorporating subjective norms (social influence) and behavioral control (self-efficacy) [22].

These theories have been successfully applied in understanding public attitudes toward dialects, but each theory has it’s unique shortcoming in explaining DA. For example, Systems Theory suffers from imprecise boundaries that make testing specific hypotheses challenging [23]. Planned Behavior Theory emphasizes primarily individual-level factors and does not adequately address the influence of external structural factors on DA [24, 25]. Therefore, we employ the Social Network Theory from Brown as a fundamental theory, which has been proved to be useful for explaining the interaction between individual behavior and macro environment [26]. There are two main reasons for adopting this theory: (1) It posits that social networks could shape social capital that enables individuals to establish connections with the external world and gain social support, which finally shapes individuals’ behavior [27]. (2) It contends that political, economic, and cultural factors systematically and profoundly shape individual behavior, aligning with our proposed mechanism for influencing public attitudes towards dialects.

We develop a theoretical framework of the mechanisms influencing public attitudes toward dialects from the following three key perspectives of Social Network Theory as:

(1) Political Resource (PR), which is conceptualized as the political connections that individuals can access, either in material or immaterial form, to improve their socio-economic status [28]. PR affects DA through two heterogeneous paths [29]. On the one hand, PR positively influences DA. A harmonious society promotes inclusive social atmosphere and enhances positive public attitudes towards dialects [3032]. For example, Social Assistance (SA) programs in developing countries effectively reduce dialect stigmatization [33, 34]. On the other hand, PR might pose a negative impact on DA. In a centralized polity, a disharmonious political environment undermines the sustainability of DA [35]. Some governments enforce official language and suppress dialect development, leading to a decline in DA. An extreme example is the political crackdown on Latin dialects by certain regimes in the United States [36].

In China, PR contributes to shaping positive public attitudes towards dialects, which are rooted in the unique political environment. Since 2010, the Chinese government has been committed to promoting an inclusive, harmonious policy support that endorse dialects as part of Chinese culture [3739]. Therefore, we propose the first hypothesis.

  1. Hypothesis 1: Political resource positively influences dialect attitude.

(2) Economic Development (ED), which refers exclusively to economic investment in cultural businesses or industries in this article [40]. The explanation path of the relationship between ED and DA is as follow: Firstly, a prosperous economy contributes to shaping an inclusive social climate, which in turn, enhances the sustainability of DA. A typical example is the economically developed Chinese province, Shanghai, where dialect-based economic groups are formed because of the favorable economic environment [41, 42]. Moreover, frequent trade and ED facilitate intercultural interactions and the spread of dialects [43, 44]. Secondly, strong social focus on ED could shape negative consequences for DA [45]. When economic efficiency becomes the sole priority, there is a risk of excluding or marginalizing certain dialect-speaking communities, as seen in the case of the Soviet Union [46].

In China, we propose that ED has a positive influence on DA. Because the Chinese government has placed the highest priority on ED since 1970, China has developed a mature ED system and a favorable social climate with social phenomena such as DA. Accordingly, we propose the second hypothesis.

  1. Hypothesis 2: Economic development positively influences dialect attitude.

(3) Cultural Resource (CR), which is the availability of cultural support that contributes to an individual’s socioeconomic status, which can be either material form, such as cultural industry (CI) and cultural consumption (CC), or immaterial form, such as cultural transmission (CT) [47]. Extensive literature consistently predicts the significant relationship between CR and DA, where more inclusive social environments would support each individual richer social resources and accordingly foster the sustainability of dialects [48, 49]. CC encompasses activities such as the broadcasting of dialect programs and the publication of dialect literature, affecting the medium accessibility of individuals to the dialect [50]. Furthermore, CI represents the degree of economic support for culture by the government in a given country, which contributes to the economy and DA, simultaneously. Empirical research proves that Beijing, the most famous dialects city in China and characterized by Beijing speech, whose CI accounts for 15.12% of the total GDP [51].

Based on the above evidence, we propose that CR positively predicts the shaping mechanism of DA. Accordingly, we propose the third hypothesis.

  1. Hypothesis 3: Cultural resource positively influences dialect attitude.

Finally, we construct a comprehensive analytical framework and analyze the potential mechanism influencing DA in three macro factors: Political Resource, Economic Development, and Cultural Resource in Fig 1.

thumbnail
Fig 1. Analytical framework and potential mechanisms of the relationship between multiple macro-level support and public attitudes towards dialects.

https://doi.org/10.1371/journal.pone.0292852.g001

Methodology

Data

The present study uses Microblog as the data source, the most prevalent Chinese public media platform known for its anonymity and authenticity. During 2021/01/01-2022/10/01, we employed crawler technology to collect the tweets about dialects on the Microblog in each Chinese province. We collect three important information including user IDs, tweets contents and locations [52]. This technology proves particularly suitable for analyzing public attention on a given social phenomenon like DA, especially in a country like China where censorship regulations are stringent.

Therefore, we collect 1,690,932 related microblogs in all 31 Chinese provinces. Subsequently, we implement several filtering steps to ensure the integrity and quality of the data set as follow: we identify and remove microblogs that lacked provenance information (n = 26227), incomplete content (n = 14100) and duplicate entries (n = 125). Finally, we obtain a valid DA database including 1,650,480 microblogs from all 31 provinces in China in 2021/01/01-2022/10/01.

Sentiment analysis

To ensure the suitability of the data for sentiment analysis, a crucial step is the pre-processing of the unstructured original data into structured data that can be analyzed. The data pre-processing stage involves three key steps:

  1. Data selection. The initial step of data selection involves two interconnected word stemming processes. Stemming is employed to reduce word prototypes and their derived forms to a common basic form. This process involves the removal of prefixes, suffixes, and other inflections to obtain the root form of a word. For example, the derivatives ‘compute’, ‘computer’, ‘computing’, and ‘computed’ would all be transformed into ‘comput’ by discarding their distinct endings and retaining their shared components. Conversely, lemmatization is a more refined process that utilizes lexical and morphological analysis to reduce a word to its base form as it appeared in the dictionary. For instance, both ‘studies’ and ‘studi’ would be reduced to ‘study’. We perform the stemming and lemmatization using the ’Natural Language Toolkit’ (NLTK) package in Python.
  2. Data cleaning. In the data cleaning stage, two primary operations are conducted to refine the data for sentiment analysis. Firstly, we convert all uppercase letters into lowercase letters. Secondly, we remove various punctuation marks, such as period (‘.’), comma (‘,’), question mark (‘?’), exclamation point (‘!’), and special characters including ampersand (‘&’) and slash (‘/’). Additionally, we eliminate all stop words, including ‘and’, ‘yes’ and ‘that’. The list of stop words is sourced from the public website (https://countwordsfree.com/stopwords).
  3. Tokenization. Following data cleaning, the tokenization process is applied. Tokenization involves segmenting the documents into smaller units, typically at the word level. We employ the ‘split’ package in Python to accomplish this task.

Sentiment analysis is a methodology used to extract and analyze the opinions and attitudes expressed by social groups, with a focus on subjective emotional polarity, namely positive and negative sentiments [53, 54]. There are two main methods: machine learning-based sentiment analysis and sentiment dictionary-based sentiment analysis [55, 56]. Machine learning-based sentiment analysis is an important method to effectively determine Microblog users’ attitudes and sentiments towards dialects, but it requires a large amount of text training, which increases the cost of this study [57]. Compared with the former, sentiment dictionary-based sentiment analysis is more efficient and convenient because it uses an existing database [58, 59]. Therefore, we choose sentiment dictionary-based sentiment analysis. We utilize the sentiment dictionary developed by the National Research Council Canada (NRC), which encompasses a comprehensive range of language tags and eight emotional states, including anger, fear, sadness, disgust, anticipation, joy, trust, and surprise [60]. The first four of these are classified as negative emotions, while the last four are classified as positive. Sentiment analysis is performed using Python to assign sentiment scores to each word in the text. Finally, we sum up the scores of each dimension to obtain the total sentiment score for each microblog. The entire data pre-processing and sentiment analysis process are provided in Fig 2.

Variables

Dependent variable.

The Dependent variable is Dialect Attitude (DA). We use crawling techniques and sentiment analysis to obtain the percentage of microblogs with positive sentiment about dialect in 31 Chinese provinces. DA is a continuous variable ranging from 41.3% to 89.8%. In order to further analyze the variations in DA across different levels, we quintile this variable. Specifically, the range of 41.3% to 51.0% is categorized as “very low” (quintile 1), 51.1% to 60.7% as “relatively low” (quintile 2), 60.8% to 70.4% as “medium” (quintile 3), 70.5% to 80.1% as “relatively high” (quintile 4), and 80.2% to 89.8% as “very high” (quintile 5). The mean value of DA in the full sample is 3.06.

Independent variable.

According to the theoretical model, DA is a function of PR, ED and CR. We select the independent variables from the above three perspectives, and all the data are taken from the National Statistical Yearbook (2022), available at http://www.stats.gov.cn/sj/ndsj/2022/indexch.htm.

  1. Political Resource (PR). PR is an essential evaluation criterion that reflects political attention towards specific social phenomena such as DA. In order to assess PR, this article employs three commonly indicators, namely National Financial Allocation (NFA), SA and Employment [61, 62]. NFA is calculated by the central government’s financial allocation for dialect-related work in each province after counterbalancing the population and area, a continuous variable ranging from 14.80 to 174.31. SA is operationalized as the assistance by non-governmental organization for dialect work in each province, after counterbalancing the population and area, a continuous variable ranging from 4.00 to 373.30. Employment represents the employment rate for each given province, a continuous variable ranging from 19.30 to 703.90. We weight the three indicators to obtain a composite indicator, PR, which with the values ranging from 0.124–1.842. To more clearly distinguish the heterogeneous effect of PR on DA across different levels, we quintile this variable as follow: 0.124–0.468 is classified as a “very low” with value of 1, 0.467–0.812 is classified as “relatively low” with value of 2, 0.813–1.156 is classified as “medium” with value of 3, 1.157–1.500 is classified as “relatively high” with value of 4, and 1.501–1.842 is classified as “very high” with value of 5. The mean value of the PR in the full sample is 3.06.
  2. Economic Development (ED). In order to capture the economic development situation, we access the three related economic variables: Gross Domestic Product (GDP), Consumer Price Indices (CPI), and Disposable Personal Income (DPI) [63, 64]. GDP is defined as the total economic output after counterbalancing the area and population. CPI represents the relative purchasing power of the necessities goods and services for the residents of a province, while DPI denotes the disposable income of its residents. All three indicators are continuous variables with specific value ranges: GDP ranges from 1902.74 to 110760.94, CPI ranges from 101.50 to 103.60, and DPI ranges from 21744.10 to 72232.40. To calculate the final ED indicator, all three variables are weighted, simultaneously. Consistent with the previous steps for defining variables, we quintile the ED, and obtain a five-category variable as follow: 1.129–1.984 is classified as “very low” (quintile 1), 1.985–2.839 is classified as “relatively low” (quintile 2), 2.840–3.694 is classified as “medium” (quintile 3), 3.695–4.549 is classified as “relatively high” (quintile 4), and 4.550–5.402 is classified as “very high” (quintile 5). The mean value of ED in the full sample is 3.12.
  3. Cultural Resource (CR). We define the CR that incorporates insights from cultural sociology and history, which considering three key indicators: CT, CI, and CC. CT encompasses four sub-indicators, library, museum, radio, and television coverage, for a given province. CI represents the economic contribution of cultural industries in a given province, after counterbalancing the population and area. CC is calculated as the expenditure on culture and entertainment for a given province, after counterbalancing the population and area. All three indicators are continuous variables with CT ranging from 1.70 to 30.80, CI ranging from 3.20 to 992.50, and CC ranging from 1.80 to 539.60. We weight the three sub-variables and generate a new variable CR, which ranges from 0.330 to 5.116. Meanwhile, we quintile CR as follows: 0.330–1.287 is classified as “very low” with value of 1, 1.288–2.244 defined as “relatively low” with value of 2, 2.245–3.201 is “medium” with value of 3, 3.202–4.158 defined as “relatively high” with value of 4, 4.159–5.116 is “very high” with value of 5. The mean value of CR in the full sample is 3.06.

Control variables.

The determinants of DA are highly intricate, and thus, in order to isolate the net estimation results, several control variables are considered, simultaneously. These variables include the gender (represent as a binary variable where 1 = male, 0 = female), years of education (indicate the number of years of education completed by respondents, ranging from 0 to 23) and the internet access rate (calculate as the proportion of individuals with internet access in a given province divided by the total population). All the data are taken from the National Statistical Yearbook (2022) (http://www.stats.gov.cn/sj/ndsj/2022/indexch.htm).

More details of the variables are given in Table 1.

Spatial regression models.

To estimate the mechanisms affecting cross-provincial heterogeneity in DA, we employ a Spatial Regression Model. The methodological reason for selecting this model over other models is that spatial autocorrelation resulting from the geographic location of individuals introduces significant bias in estimation, especially in the cross-provincial analysis. The first step is to verify that DA is spatially autocorrelated, as follows: (1)

The x in Eq (1) denotes the DA of a province. i and j denote the spatial weight array constructed based on the longitude and latitude of the province, respectively, and wi, j represent the spatial weight matrix, which measure the spatial distance between region i and j. S2 is the sample variance. The upper part (Ii1) is the global Moran index and the lower part (Ii2) is the local Moran index. These indices are used to examine the spatial autocorrelation at a global and local level, respectively. The values of both sets of Moran indices range from -1 to 1. Negative values indicate the presence of negative spatial correlation, positive values indicate positive correlation, and values closer to 0 indicate a more random spatial distribution, suggesting that the spatial autocorrelation is not statistically significant.

After performing the Moran index test, this study examines the impact of the selected indicators on DA by employing two spatial econometric models, after balancing spatial endogeneity. The first model utilized is the Spatial Lag Model, as shown in Eq (2).

(2)

In this model, X represents the matrix of independent variables that indicate the spatial distribution of PR, ED and CR across 31 provinces. Similarly, y represents the matrix of dependent variables that indicate the spatial distribution of DA across 31 provinces. W denotes the spatial weight matrix, λ denotes the spatial autoregressive coefficient, β denotes the matrix of parameters, ε denotes the random disturbance term.

The second model is the Spatial Error Model, as shown in Eq (3) below.

(3)

X is the matrix of independent variables, which indicate the spatial distribution of PR, ED, CR. y is the matrix of dependent variables, denoting the matrix of DA. β is the matrix of parameters to be estimated, W is the spatial distance matrix, μ is the spatial error term, and ε is the regression error term.

Results

Description statistics

Fig 3 presents a comprehensive description of all indicators for the 31 provinces. Several meaningful preliminary findings are as follows: (1) Overall, the Chinese public shares a relatively positive attitude toward dialects, the mean value of DA is 3.06, which indicates that more than 60% of Chinese Internet users share a positive attitude towards dialects. However, it is important to highlight that there are significant variations in DA across different provinces. (2) The western region, characterized as the less-developed part of China, exhibits a significantly lower DA compared to the medium and eastern regions. Specifically, the mean of DA of the eastern, medium, and western regions are 3.91, 3.50, and 2.00, respectively. (3) Provinces with higher PR, ED and CR show significantly higher DA compared to provinces with lower indicators. Taken together, the data description captures that Chinese residents generally hold a positive perception of dialects, but there are significant regional differences that may be influenced by factors such as PR, ED, CR.

thumbnail
Fig 3. Descriptive statistics for 31 provinces in China (n = 1,650,480).

https://doi.org/10.1371/journal.pone.0292852.g003

Multicollinearity test

Because the selected variables in this study share the similar conceptualization processes and introduce potential multicollinearity problems which finally bias the estimation. Table 2 reports the correlation coefficients among the selected variables. Although some variables are correlated, the variance inflation factor (VIF) values of all variables range from 1 to 5 and are less than 10 [65]. Consequently, there is no significant multicollinearity.

thumbnail
Table 2. Pearson correlation analysis for all selected variablesa.

https://doi.org/10.1371/journal.pone.0292852.t002

The determinants to Chinese public attitude towards dialects

Spatial auto-correlation test.

Before conducting spatial regressions, it is essential to assess the spatial autocorrelation among the selected variables. Fig 4 illustrates the results of the global Moran index test, which reveals a statistically significant positive globally spatial-correlation (MoranI = 0.221, Z-Value = 3.261), indicating that the spatial effects of DA in the provinces of China are distinct and exhibit geographic clustering.

thumbnail
Fig 4. Global Moran indexab.

a MoranI > 0 indicates a positive spatial correlation of the attribute values in the region. b |Z|>1.96, p<0.05, MoranI is significant.

https://doi.org/10.1371/journal.pone.0292852.g004

Additionally, Table 3 provides local Moran index information of DA in China. It reveals that the local Moran index values are statistically significant in 8 of the 31 provinces, which indicates that the DAs in these provinces are clustered in their respective geographical locations. Taken together, these findings indicate that DA in China is significantly associated with geographical location. If this geographic bias due to spatial autocorrelation is not controlled, the estimations will be inaccurate. And accordingly, we will employ a spatial regression model to estimate the characteristics of DA.

Basic spatial regression estimation

Table 4 presents the empirical result of the basic regression model. Model 1 is a linear model using ordinary least squares (OLS), followed by Model 2, which employs Heckman model to address traditional endogenous problems. Finally, Model 3 utilizes structural equation modeling (SEM) to further improve the accuracy of the estimation, after balancing the bias caused by spatial autocorrelation. Panel 2 displays the model fitting information, and with the criteria of “comprehensive consideration of AIC and BIC information, the smaller the better”, we select Model 3 to report.

thumbnail
Table 4. The comparison between different selected model (OLSa, Heckmanb, SEMc) of the correlation of PR, ED, CR on DA.

https://doi.org/10.1371/journal.pone.0292852.t004

The following meaningful conclusions regarding the potential determinants of DA of Chinese public are drawn: (1) PR poses a statistically significant positive effect on DA (β = 0.076, SD = 0.036, P<0.05). This implies that political intervention plays a crucial role in shaping positive public attitudes towards dialects, which aligns with previous research [66]. (2) ED has a positive statistically significant effect on DA (β = 0.047, SD = 0.022, P<0.05). This suggests that regional economic growth contributes to creating a harmonious social atmosphere and fostering positive DA. (3) CR poses a positive statistically significant effect on DA (β = 0.054, SD = 0.021, P<0.05). This finding is consistent with established literature, which emphasizes the importance of shaping an inclusive cultural atmosphere and promoting positive attitudes towards dialects [67]. In summary, the results confirm Hypothesis 1, 2, 3, respectively.

Heterogeneity analysis.

It is important to analysis the different influence across different samples, to draw more comprehensive conclusions across regions. Model 4–10 in Table 5 systematically present the results for high (above average) and low (below average) levels of PR, ED and CR. Some meaningful conclusions about the heterogeneous effects are drawn: (1) The positive effect of PR on DA is more pronounced in the relatively advantaged subgroups (higher PR, ED, and CR), compared to their disadvantaged counterparts. (2) The positive effect of CR on DA is more significant in the relatively advantaged subgroups (higher PR, ED, and CR), compared to their disadvantaged counterparts. (3) The positive effect of ED on DA is higher in relatively disadvantaged subgroups (lower PR, ED, and CR), compared to their advantaged counterparts. Taken together, political and cultural support have become more saturated in shaping positive public dialect attitudes, meaning that these two determinants are more likely to influence the advantaged counterparts. Meanwhile, economic support is still relatively scarce, so the positive influence of economic support remains at a marginal incremental benefit. The differences in the effects of different subgroups are visualized in Fig 5.

thumbnail
Fig 5. Heterogeneity analysis of different samplesab.

a The closer the color is to black, the smaller the impact, and the closer it is to blue, the greater the impact. b T-value is significant at 5%, 1%, 1‰ at 1.65, 1.96, 2.76.

https://doi.org/10.1371/journal.pone.0292852.g005

thumbnail
Table 5. Heterogeneity analysis of the influence across different samplesa.

https://doi.org/10.1371/journal.pone.0292852.t005

Robustness.

To test the robustness, we replace the sentimental dictionary, using Hownet dictionary and NTUSD dictionary. In addition, we replace the geographic distance matrix in the spatial regression with the economic distance matrix, and the results are shown in Table 6. It reveals that the robustness is maintained after replacing the dictionary and spatial matrix.

Discussion

From the perspective of traditional academic views, political scientists, economists, and sociologists almost unanimously agree that dialects shaping social divisions and exacerbate social inequality [68, 69]. However, recent linguistic research suggests that dialects foster a more inclusive, sustainable, and cohesive social environment, and accordingly enhancing the well-being of local residents [14, 70]. To reveal the Chinese public attitudes towards dialects and clarify the potential determinants related to heterogeneous public attitudes, we use 1,650,480 microblogs collected from 31 Chinese provinces by crawler technology, and using sentiment analysis to analyze them. The findings of this study can be summarized as follows:

  1. Overall, the Chinese public shares a relatively positive attitude toward dialects with significant variations across different provinces [71]. Fig 3 indicates that the mean value of DA is 3.06, suggesting that more than 60% Chinese public poses positive attitudes towards dialect. The positive attitude towards dialects observed among the Chinese public reflects the government’s efforts to preserve the various dialects [72]. Meanwhile, the present study also captures the significant variations of DA across provinces. Specifically, underdeveloped regions in the west part of the country exhibit lower DA compared to more developed medium and eastern regions. The above scenario indicates that there still exists systematically social stigma against dialects in Chinese disadvantaged regions [73]. Previous literature suggests that the promotion of the official language during China’s economic boom from 2000 to 2010 had a detrimental impact on dialects [74, 75]. Nevertheless, President Xi’s proposed dialect protection policies since 2015 till now have largely mitigated the crisis surrounding dialect [76].
  2. PR, ED and CR pose positive effects on DA. Firstly, PR plays a significant positive role in shaping positive public attitude toward dialects (β = 0.076, SD = 0.036, P<0.05). The political support of protecting dialects is given the highest priority, especially in a country like China where the central government dominates [77]. For instance, political support has played a fundamental role in safeguarding Native American dialects in the United States, where they have been officially recognized as languages [78, 79]. Secondly, ED positive predicts the promotion of DA (β = 0.047, SD = 0.022, P<0.05). According to sociologists and economists, sociocultural phenomena such as dialects are embedded in economic development situation. In a developed economic region, the culture tends to be more inclusive, fostering the coexistence and development of multiple dialects rather than favoring a single official language [80]. Lastly, CR also contributes significantly to shaping positive public attitudes toward dialects (β = 0.054, SD = 0.021, P<0.05). Increasing cultural support for a region would imply the promotion of a more inclusive social climate, which is conducive to the development of dialects and has been confirmed in several established studies [81]. We extend it to China and find that cultural support remains an important part of efforts to safeguard dialects, despite the country’s predominant emphasis on economic advancement in the 21st century [82].
  3. PR and CR influence more significant in the relatively advantaged regions, and ED poses a higher influence in the relatively disadvantaged regions. The results of heterogeneity analysis show regional heterogeneity in the effects of the selected factors on DA. Specifically, political and cultural support pose higher influence in developed regions, while economic support is more advantageous in underdeveloped regions. Taken together, political and cultural support result in diminishing marginal benefits, whereas economic support leads to incremental marginal benefits. This disparities could be attributed to the following scenario: Although Chinese economy booms, the government still under-invests economically in social culture (e.g., dialects). Instead, dialect development relies more on cultural and political support. Furthermore, different regions pose different political, economic and cultural situations. For instance, the most developed province have approximately 4.5 times higher GDP per capita compared to the least developed ones [83]. Meanwhile, underdeveloped regions are characterized by a greater diversity of dialects due to their more complex ethnic composition [8486]. Therefore, it is important to implement region-specific strategies and shape positive public attitudes toward dialects [87, 88].

The study presents a comprehensive scenario on Chinese public attitudes towards dialects. And we propose several policy recommendations:

  1. To ensure the sustainability of dialects, it is crucial to implement policies that focus on improving dialect quality and reducing regional inequalities. Drawing inspiration from successful European approaches, the Chinese government can consider adopting policies such as "multilingual parallelism policy" and "promote language attitudes policy" in Switzerland [89, 90]. These policies have demonstrated their effectiveness in addressing dialect inequalities and promoting positive DA.
  2. Sufficient economic assistance plays a vital role in the protection of dialect, particularly in undeveloped regions. The government should allocate increased financial resources towards dialect protection initiatives, as evidenced by successful cases in other countries [91, 92]. Additionally, non-governmental organizations (NGOs) and enterprises should also assume the responsibility of safeguarding dialects by offering financial assistance [93].
  3. Distinct strategies need to be implemented for different regions. In underdeveloped regions, it is imperative to cultivate positive public attitudes towards dialects through economic support. Conversely, in developed regions, the creation of an inclusive social environment through political and cultural support is necessary to foster public acknowledgement of dialects [94, 95].

Conclusion, limitations and implications

Basing on the combination of crawler technology and sentiment analysis, the present study develops the most comprehensive database which takes 1,650,480 dialects-related microblogs from 31 Chinese provinces, and describes the following scenario: (1) Overall, the Chinese public shares a relatively positive attitude toward dialects with significant variations across different provinces, (2) PR, ED and CR pose positive effects on DA and (3) PR and CR influence more significant in the relatively advantaged regions, and ED poses a higher influence in the relatively disadvantaged regions.

The present study has several limitations to be explained carefully: (1) Data limitations. The findings of this study are based on provincial-level data, which may limit their generalizability to individuals-level attitudes. (2) Method limitations. The sentiment dictionary is not accurate enough, although it may work better using machine learning, this paper lacks a large number of texts to be trained. (3) Research aim limitations. This study focuses solely on exploring the macro-level mechanisms, which may overlook the potential influence of micro-level factors. (4) Endogeneity limitations. While this study addresses the endogeneity problem caused by geographic location, there may be other sources of endogeneity such as sample self-selection, omitting variables, reverse causation.

Despite its limitations, this study provides a comprehensive depiction of the Chinese public attitudes towards dialects and the macro determinants that influence these attitudes. And accordingly, the study makes three noteworthy implications for future study: (1) More big data-based sociolinguistic analysis is necessary, such as using Twitter to analyze public attitudes toward Native American accents in the United States and to reveal the social determinants. (2) Future research needs to incorporate a more integrated framework that takes political, economic, and cultural macro factors into consideration. (3) Sentiment analysis is an important research direction in sociolinguistics that needs attention [96].

References

  1. 1. Cargile AC, Giles H, Ryan EB, Bradac JJ. Language attitudes as a social process: A conceptual model and new directions. Language & Communication. 1994;14(3):211–36. https://doi.org/10.1016/0271-5309(94)90001-9
  2. 2. Montoyo A, Martínez-Barco P, Balahur A. Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. Decision Support Systems. 2012;53(4):675–9. https://doi.org/10.1016/j.dss.2012.05.022
  3. 3. Du Z. The Chinese language demystified: Cambridge Scholars Publishing; 2015.
  4. 4. Bradley D, Bradley M. Language endangerment and language maintenance: An active approach: Routledge; 2013.
  5. 5. Dweik BSi Qawar HA. Language choice and language attitudes in a multilingual Arab Canadian community: Quebec–Canada: A sociolinguistic study. British Journal of English Linguistics. 2015;3(1):1–12.
  6. 6. Hebl MR, Dovidio JF. Promoting the “social” in the examination of social stigmas. Personality and Social Psychology Review. 2005;9(2):156–82. pmid:15869380
  7. 7. Wiley TG. Chinese “Dialect” Speakers as Heritage Language Learners: A Case Study 1. Heritage language education: Routledge; 2017. p. 91–106.
  8. 8. Wolfram W. Dialect in society. The handbook of sociolinguistics. 2017:107–26. https://doi.org/10.1002/9781405166256.ch7
  9. 9. Jaspal R, Sitaridou I. Coping with stigmatized linguistic identities: Identity and ethnolinguistic vitality among Andalusians. Identity. 2013;13(2):95–119. https://doi.org/10.1080/15283488.2012.747439
  10. 10. Canagarajah S. Reconstructing heritage language: Resolving dilemmas in language maintenance for Sri Lankan Tamil migrants. De Gruyter Mouton; 2013. https://doi.org/10.1515/ijsl-2013-0035
  11. 11. Gong Y, Chow IH-s, Ahlstrom D. Cultural diversity in China: Dialect, job embedded-ness, and turnover. Asia Pacific Journal of Management. 2011;28:221–38.
  12. 12. Liao S, editor A perceptual dialect study of Taiwan Mandarin: Language attitudes in the era of political battle. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20); 2008: Citeseer.
  13. 13. O’Brien T, Creţan R, Jucu IS, Covaci RN. Internal migration and stigmatization in the rural Banat region of Romania. Identities. 2022:1–21. https://doi.org/10.1080/1070289X.2022.2109276
  14. 14. Garrett P, Coupland N, Williams A. Investigating language attitudes: Social meanings of dialect, ethnicity and performance: University of Wales Press; 2003.
  15. 15. Laube A, Rothmund J. ‘Broken English’,‘dialect’or ‘Bahamianese’? Language attitudes and identity in The Bahamas. Journal of Pidgin and Creole Languages. 2021;36(2):362–94. https://doi.org/10.1075/jpcl.00079.lau
  16. 16. Yiakoumetti A. Choice of classroom language in bidialectal communities: to include or to exclude the dialect? Cambridge journal of education. 2007;37(1):51–66. https://doi.org/10.1080/03057640601179046
  17. 17. Kingsbury N, Scanzoni J. Structural-functionalism. Sourcebook of family theories and methods: A contextual approach. 1993:195–221.
  18. 18. Liu W, Sidhu A, Beacom AM, Valente TW. Social network theory. The international encyclopedia of media effects. 2017:1–12.
  19. 19. Luhmann N, Baecker D, Gilgen P. Introduction to systems theory: Polity Cambridge; 2013.
  20. 20. McGuire WJ. The structure of individual attitudes and attitude systems. Attitude structure and function. 1989:37–69.
  21. 21. Bangeni B, Kapp R. Shifting language attitudes in linguistically diverse learning environment in South Africa. Journal of Multilingual and Multicultural development. 2007;28(4):253–69. https://doi.org/10.2167/jmmd495.0
  22. 22. McKenzie RM. Social factors and non-native attitudes towards varieties of spoken English: a Japanese case study. International journal of Applied linguistics, 2008;18(1):63–88.
  23. 23. Patton W, McMahon M. Career development and systems theory: Connecting theory and practice: Springer; 2014.
  24. 24. Kaiser FG, Hübner G, Bogner FX. Contrasting the theory of planned behavior with the value‐belief‐norm model in explaining conservation behavior 1. Journal of applied social psychology. 2005;35(10):2150–70. https://doi.org/10.1111/j.1559-1816.2005.tb02213.x
  25. 25. Wight D, Plummer M, Ross D. The need to promote behaviour change at the cultural level: one factor explaining the limited impact of the MEMA kwa Vijana adolescent sexual health intervention in rural Tanzania. A process evaluation. BMC Public Health. 2012;12:1–12. https://doi.org/10.1186/1471-2458-12-788
  26. 26. Lewis TD. A way to sustained salience: cultural identity, social networking, and language attitude in Lorain Puerto Rican English. 2015.
  27. 27. Xin L, Qin K, editors. Embeddedness, social network theory and social capital theory: Antecedents and consequence. 2011 International Conference on Management and Service Science; 2011: IEEE.
  28. 28. Frynas JG, Mellahi K, Pigman GA. First mover advantages in international business and firm‐specific political resources. Strategic Management Journal. 2006;27(4):321–45. https://doi.org/10.1002/smj.519
  29. 29. Derungs C, Sieber C, Glaser E, Weibel R. Dialect borders—political regions are better predictors than economy or religion. Digital Scholarship in the Humanities. 2020;35(2):276–95. https://doi.org/10.1093/llc/fqz037
  30. 30. Blommaert J, Verschueren J. The role of language in European nationalist ideologies. I Language ideologies. Practice and theory (189–210), red. av BB Schieffelin et al. New York, Oxford: Oxford University Press; 1998.
  31. 31. Delhey J, Newton K. Predicting cross-national levels of social trust: global pattern or Nordic exceptionalism? European sociological review. 2005;21(4):311–27. https://doi.org/10.1093/esr/jci022
  32. 32. Mai R, Hoffmann S. Four positive effects of a salesperson’s regional dialect in services selling. Journal of Service Research. 2011;14(4):460–74. https://doi.org/10.1177/1094670511414551
  33. 33. Roelen K. Receiving social assistance in low-and middle-income countries: Negating shame or producing stigma? Journal of Social Policy. 2020;49(4):705–23. https://doi.org/10.1017/S0047279419000709
  34. 34. Taylor SE. Social support: A review. The Oxford handbook of health psychology. 2011; 1:189–214.
  35. 35. Kavalski E. World politics at the edge of chaos: Reflections on complexity and global life: State University of New York Press; 2015.
  36. 36. Spolsky B. Language policy: Cambridge university press; 2004.
  37. 37. Kauppi N., Democracy social resources and political power in the European Union. Democracy, social resources and political power in the European Union: Manchester University Press; 2018. https://doi.org/10.7765/9781526130334
  38. 38. Rothstein SA, Schulze-Cleven T. Germany after the social democratic century: The political economy of imbalance. German Politics. 2020;29(3):297–318.
  39. 39. Cao Q. The language of soft power: mediating socio-political meanings in the Chinese media. Critical Arts: South-North Cultural and Media Studies. 2011;25(1):7–24.
  40. 40. Nafziger EW. Economic development: Cambridge university press; 2012.
  41. 41. Lameli A, Südekum J, Nitsch V, Wolf N. Same same but different: Dialects and trade. German Economic Review. 2015;16(3):290–306. https://doi.org/10.1111/geer.12047
  42. 42. Wang D, You H. The impact of language dialect on fertility: Routledge: London; 2006.
  43. 43. Gatlin B, Wanzek J. Relations among children’s use of dialect and literacy skills: A meta-analysis. Journal of Speech, Language, and Hearing Research. 2015;58(4):1306–18. https://doi.org/10.1044/2015_JSLHR-L-14-0311
  44. 44. Wei Y, Kang D, Wang Y. Geography, culture, and corporate innovation. Pacific-Basin Finance Journal. 2019;56:310–29.
  45. 45. Hara K. Regional dialect and cultural development in Japan and Europe. 2005. https://doi.org/10.1515/ijsl.2005.2005.175-176.193
  46. 46. Roesch KA. Language maintenance and language death: The decline of Texas Alsatian: John Benjamins Publishing; 2012.
  47. 47. Falck O, Heblich S, Lameli A, Südekum J. Dialects, cultural identity, and economic exchange. Journal of urban economics. 2012;72(2–3):225–39.
  48. 48. Lin C-W. Examining feature economy in Arabic dialects. Perspectives on Arabic Linguistics XXVIII: John Benjamins; 2016. p. 37–62.
  49. 49. Schneider EW. The dynamics of New Englishes: From identity construction to dialect birth. Language. 2003;79(2):233–81.
  50. 50. Xu J, Hampden-Thompson G. Cultural reproduction, cultural mobility, cultural resources, or trivial effect? A comparative approach to cultural capital and educational performance. Comparative Education Review. 2012;56(1):98–124.
  51. 51. Shuguang C, Yunpeng L. The study about development status, trends and paths of cultural industry in China. Energy Procedia. 2011;5:2078–81. https://doi.org/10.1016/j.egypro.2011.03.358
  52. 52. Udapure TV, Kale RD, Dharmik RC. Study of web crawler and its different types. IOSR Journal of Computer Engineering. 2014;16(1):01–5.
  53. 53. Zhang L, Wang S, Liu B. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2018;8(4):e1253. https://doi.org/10.1002/widm.1253
  54. 54. Mouthami K, Devi KN, Bhaskaran VM, editors. Sentiment analysis and classification based on textual reviews. 2013 international conference on Information communication and embedded systems (ICICES); 2013: IEEE.
  55. 55. Bhonde R, Bhagwat B, Ingulkar S, Pande A. Sentiment analysis based on dictionary approach. International Journal of Emerging Engineering Research and Technology. 2015;3(1):51–5.
  56. 56. Frankel R, Jennings J, Lee J. Disclosure sentiment: Machine learning vs. dictionary methods. Management Science. 2022;68(7):5514–32. https://doi.org/10.1287/mnsc.2021.4156
  57. 57. Wawre SV, Deshmukh SN. Sentiment Classification using Machine Learning Techniques.
  58. 58. Ahmed M, Chen Q, Li Z. Constructing domain-dependent sentiment dictionary for sentiment analysis. Neural Computing and Applications. 2020;32:14719–32. https://doi.org/10.1007/s00521-020-04824-8
  59. 59. Wu J, Lu K, Su S, Wang S. Chinese micro-blog sentiment analysis based on multiple sentiment dictionaries and semantic rule sets. IEEE Access. 2019;7:183924–39.
  60. 60. Plutchik R. A general psychoevolutionary theory of emotion. Theories of emotion: Elsevier; 1980. p. 3–33.
  61. 61. Boulanger P-M. Political uses of social indicators: overview and application to sustainable development indicators. International journal of sustainable development. 2007;10(1–2):14–32. https://doi.org/10.1504/IJSD.2007.014411
  62. 62. Cutright P. National political development: Measurement and analysis. Comparative Government: A Reader. 1969:29–41.
  63. 63. Arzu A, Issa T. An effect on cultural identity: Dialect. Procedia-Social and Behavioral Sciences. 2014;143:555–62. https://doi.org/10.1016/j.sbspro.2014.07.435
  64. 64. Tatman R, editor Gender and dialect bias in YouTube’s automatic captions. Proceedings of the first ACL workshop on ethics in natural language processing; 2017.
  65. 65. Senaviratna N A Cooray T. Diagnosing multicollinearity of logistic regression model. Asian Journal of Probability and Statistics. 2019;5(2):1–9.
  66. 66. Perrino S. Veneto out of Italy? Dialect, migration, and transnational identity. Applied Linguistics. 2013;34(5):574–91.
  67. 67. Scapoli C, Goebl H, Sobota S, Mamolini E, Rodriguez-Larralde A, Barrai I. Surnames and dialects in France: Population structure and cultural evolution. Journal of theoretical biology. 2005;237(1):75–86. pmid:15935393
  68. 68. Harris R. Language and social class: a Rosen contribution. Changing English, 2009;16(1):81–91.
  69. 69. Nettels E., Language race, and social class in Howells’s America: University Press of Kentucky; 1988.
  70. 70. Giles H, Rakić T. Language attitudes: Social determinants and consequences of language variation. The Oxford handbook of language and social psychology. 2014:11–26.
  71. 71. Li J, editor Dialect attitude, dialect environment and dialect degradation: Evidence from Hukou dialect in China. 3rd International Conference on Culture, Education and Economic Development of Modern Society (ICCESE 2019); 2019: Atlantis Press.
  72. 72. Gao X. Linguistic instrumentalism and national language policy in Mainland China’s state print media coverage of the Protecting Cantonese Movement. Chinese Journal of Communication. 2017;10(2):157–75. https://doi.org/10.1080/17544750.2016.1207694
  73. 73. Dörnyei Z, Csizér K. Some dynamics of language attitudes and motivation: Results of a longitudinal nationwide survey. Applied linguistics. 2002;23(4):421–62. https://doi.org/10.1093/applin/23.4.421
  74. 74. Francis N. Language and dialect in China. Chinese Language and Discourse An International and Interdisciplinary Journal. 2016;7(1):136–49. https://doi.org/10.1075/cld.7.1.05fra
  75. 75. Gao X. The ideological framing of ‘dialect’: an analysis of mainland China’s state media coverage of ‘dialect crisis’(2002–2012). Journal of Multilingual and Multicultural Development. 2015;36(5):468–82. https://doi.org/10.1080/01434632.2014.943234
  76. 76. Curdt-Christiansen XL, Gao X. Family language policy and planning in China: The changing langscape. Current Issues in Language Planning. 2021;22(4):353–61. https://doi.org/10.1080/14664208.2020.1819049
  77. 77. Soroka SN, Wlezien C. Degrees of democracy: Politics, public opinion, and policy: Cambridge University Press; 2010.
  78. 78. Clopper CG. Perception of dialect variation. The handbook of speech perception. 2021:333–64. https://doi.org/10.1002/9781119184096.ch13
  79. 79. Clopper CG, Bradlow AR. Free classification of American English dialects by native and non-native listeners. Journal of phonetics. 2009;37(4):436–51. pmid:20161400
  80. 80. Ng BC, Cavallaro F. Multilingualism in Southeast Asia: The Post-Colonial Language Stories of Hong Kong, Malaysia and Singapore. Multidisciplinary Perspectives on Multilingualism. 2019:27–50.
  81. 81. Mee KH. Korean TV dramas in Taiwan: With an emphasis on the localization process. Korea Journal. 2005;45(4):183–205.
  82. 82. Huibin X, Marzuki A, Razak AA. Protective development of cultural heritage tourism: The case of Lijiang, China. theoretical and empirical researches in urban management. 2012;7(1):39–54.
  83. 83. Zhou Q, Shi W. Socio-economic transition and inequality of energy consumption among urban and rural residents in China. Energy and Buildings. 2019;190:15–24. https://doi.org/10.1016/j.enbuild.2019.02.015
  84. 84. Chinnasamy P, Suresh V, Ramprathap K, Jebamani BJA, Rao KS, Kranthi MS. COVID-19 vaccine sentiment analysis using public opinions on Twitter. Materials Today: Proceedings. 2022;64:448–51. pmid:35502322
  85. 85. Santiago C, Centeno ZJR, Ulanday MLP, Cahapin EL. Sentiment Analysis of Students’ Experiences during Online Learning in a State University in the Philippines. International Journal of Computing Sciences Research. 2022.
  86. 86. Xue Y, Liu H. Exploration of the Dynamic Evolution of Online Public Opinion towards Waste Classification in Shanghai. International Journal of Environmental Research and Public Health. 2023;20(2):1471. pmid:36674228
  87. 87. Mettewie L, Janssens R. Language use and language attitudes in Brussels. Multilingual Matters. 2007;135:117.
  88. 88. Sallabank J. Can majority support save an endangered language? A case study of language attitudes in Guernsey. Journal of Multilingual and Multicultural Development. 2013;34(4):332–47. https://doi.org/10.1080/01434632.2013.794808
  89. 89. Errihani M. Language attitudes and language use in Morocco: effects of attitudes on ‘Berber language policy’. The Journal of North African Studies. 2008;13(4):411–28. https://doi.org/10.1080/13629380701800492
  90. 90. Lundberg A. Multilingual educational language policies in Switzerland and Sweden: A meta-analysis. Language Problems and Language Planning. 2018;42(1):45–69. https://doi.org/10.1075/lplp.00005.lun
  91. 91. Coluzzi P. Endangered minority and regional languages (‘dialects’) in Italy. Modern Italy. 2009;14(1):39–54. https://doi.org/10.1080/13532940802278546
  92. 92. Vidau Z. The legal protection of national and linguistic minorities in the region of Friuli Venezia Giulia: A comparison of the three regional laws for the “Slovene linguistic minority”, for the “Friulian language” and for the “German-speaking minorities”. Razprave in gradivo: revija za narodnostna vprašanja/Treatises and documents: journal of ethnic studies. 2013;71:27–52.
  93. 93. Crack AM. Language, NGOs and inclusion: the donor perspective. Development in Practice. 2019;29(2):159–69. https://doi.org/10.1080/09614524.2018.1546827
  94. 94. Arsana IKS, Olilingo FZ, editors. Economic Shift And Inequality Between Provinces In Sulawesi Island, Republic Of Indonesia. Proceedings of International Interdisciplinary Conference on Sustainable Development Goals (IICSDGs); 2021.
  95. 95. Tiryakian E, Rogowski R. New nationalisms of the developed West: toward explanation: Routledge; 2020.
  96. 96. Vari J, Tamburelli M. Standardisation: bolstering positive attitudes towards endangered language varieties? Evidence from implicit attitudes. Journal of Multilingual and Multicultural Development. 2020:1–20. https://doi.org/10.1080/01434632.2020.1829632