Gender differences in self-view and desired salaries: A study on online recruitment website users in China

One explanation for the gender pay differences in labor markets is that women propose lower desired salaries. By using an actual job seeking resume database and applying text mining techniques, we are able to observe both the extent of gender differences in desired salaries and job-related self-view. We find gender differences in global self-view favoring females, and in some domain-specific self-view favoring males. Previous findings of disadvantaged groups having levels of self-view at least as high as those of advantaged groups lend credibility to our findings. Moreover, we argue that the differences in global self-view favoring females may be related to the theories of “belief flipping”, since women in our sample of online-recruitment markets are distinct from the general population, with on average 15.2 years of education and 8.99 years of work experience, due to self-selection. In addition, we find that women do propose lower desired salary than men, after controlling for various factors such as human capital, marital status, industries. We further investigate the role of self-view and find it contributes to explain desired salaries, with modest mediator effect but little moderator effect on gender differences in desired salaries.


Introduction
Around the world, women earn less than men [1]. For example, in the U.S., the median weekly earnings of women above 25 years of age were about 81.3% of those of their male counterparts in 2017 [2]. The reasons are manifold: gender differences in human capital; gender segregation in school majors and in labor markets; maternal responsibilities; gender discrimination; gender differences in preferences for competition and security, etc. [3][4][5][6].
There is another important branch of research suggesting a different determinant of gender differences in labor markets: men propose higher desired salary and are more likely to engage in salary negotiations than women [7][8][9][10][11]. For example, [9] report survey data on 435 undergraduate students suggesting that female students have an 8.3% lower desired career-entry salary and 33% lower desired career-peak salary than male students. Relatedly, [10] observe that in a laboratory experiment on MBA students, single female students reported lower desired salaries and willingness to travel and work long hours on a real-stakes placement PLOS ONE | https://doi.org/10.1371/journal.pone.0210072 January 10, 2019 1 / 17 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 questionnaire, in order to be seen as more acceptable in marriage markets. Finally, [11] find from a natural field experiment that when there is no explicit statement that wages are negotiable, men are more likely to negotiate for a higher wage, whereas women are more likely to signal their willingness to work for a lower wage.
In this study, we depart from traditional investigation by using a large database of job seeking resumes in actual labor markets. We use data from a major online recruitment website in China. There are several advantages in using actual labor market data. First, the sample size is adequate to overcome some statistical issues. Second, the sample population is from a wide range of labor forces who have already accumulated some real-world work experience, instead of student samples utilized by most previous research. Last but most importantly, subjective items such as desired salary and self-view can vary substantially under different circumstances, as documented by [10][11][12], etc. Our study on actual job-seeking resumes could offer new evidence on gender differences in labor markets.
Another contribution of this study is the investigation of the role of self-view in determining salary differences and its possible mediator and moderator effects. Thanks to text data mining techniques, we are able to extract key words and quantify the dimensions of self-view from its text-format information. The psychology and sociology literatures are rich with self-related theories and sex-based stereotypes but still lack connections from theoretical investigations to everyday problems [13][14]. In this way, our study moves the exploration of effects of self-view from the lab to actual labor markets.
Our results suggest that women indeed propose lower desired salary than men, controlling for human capital, marital status, work status, enterprise type, and industry fixed. While women even have slight advantage in education in terms of education years, elite university degrees and certificates held, they fall behind in work experience with shorter years in labor markets, less willingness to change jobs for promotion, and higher probability to compromise for marriage. These results square well with laboratory evidence [8,10,12].
We find gender differences in global self-view favoring females, and in some domain-specific self-view favoring males. To reconcile some differences with the existing literature, we perform the gender differences of self-view across different groups, based on whether they are fresh graduates, whether they are in female-dominated industries, and across groups of different marital status, and length of work experience. Finally, we find self-view measures are good contributors to explaining desired salaries, with modest mediator effect but little moderator effect on gender differences in desired salaries.
We view these findings as a new piece of evidence on gender differences in self-view and desired salaries. Beyond providing an estimate of gender differences in desired salaries caused by human capital, marital status, employment status, etc., we connect self-view to desired salaries using text data mining techniques on job seeking resumes in actual labor markets.

Data and preliminary observations
We use resume data from zhaopin.com, one of the largest online recruitment websites in China. The reason we choose zhaopin.com is that it provides information on desired salary, self-view, and other variables we need. Zhaopin.com was established in 1994, and its business covers a vast majority of cities in China. Its resume data contains variables such as age, education experience, work experience, job intention (including desired salary), work status (working or out of work), self-view, residence city, work place, hukou affiliation, etc. For this study, we choose a random sample that comprises job seekers of working age. Specifically, the sample consists of men are between age 18 and 65, and women between age 18 and 60, looking for a full-time job, and that relevant variables are not missing. We randomly selected 25000 resumes in June 2017, with 20593 resumes remained after cleaning, of which 10012 are men and 10581 are women. The quantitative self-view data is achieved by analyzing free-style texts of self-view using "Jieba", a Python module dealing with Chinese word segmentation.
Unlike LinkedIn, resume data from zhaopin.com can only be seen by potential employers, and is not available to the public (The data is officially acquired from a database initiated by Minsheng Weekly. The database by Minsheng has a random subsample of the resume database of zhaopin.com. Minsheng Weekly is a subsidiary body held by People's Daily. The data is accessible for research purpose, but one has to apply for permission via their official website http://www.cnbo.tv or http://www.msweekydata.com, or email address cnbotv@163.com. We complied with the terms of service for the websites from which we collected data). Therefore, the platform of zhaopin.com serves merely as a job market instead of social media or social network. This guarantees that job seekers produce their resumes without concerns about signaling undesirable traits, such as ambition, to friends or acquaintances or potential dating mates. In this way, our work may add to the literature by observing job seekers in a relatively solitary and private environment, compared to in-classroom questionnaires.
Supplemental data of average housing prices by city is extracted from the Monthly Report of Housing Market in China (June 2017) published by the Chinese Academy of Social Science, adjusted and completed with reference to fang.com and lianjia.com, two largest housing transaction websites in China.

A. Desired salary
Job seekers are asked to fill in an online resume when they register on zhaopin.com, including a drop-down menu named "Desired Salary" as a subsection of "Job Intention". The items in the drop-down menu include (in the unit of RMB/month): below 1000, 1000-2000, 2001-4000, 4001-6000, 6001-8000, 8001-10000, 10001-15000, 15000-25000, 25000-35000, 35000-50000, 50000-70000, 70000-100000, above 100000, and negotiation face-to-face. The distribution of the desired salaries by gender is shown in Table 1 after 98% winsorization (below 1% and above 99% reset to the boundary values). Overall, the majority of the desired salaries are between 4000 and 15000. As regard to gender difference, women occupy more than half of the population when the desired salary is below 8000, and less than half when above 8000. Table 2 describes gender differences in individual characteristics. We take the mid-point value of the desired salary in each range of the drop-down menu (e.g., we take 5000 for values in the range 4001-6000), and then take the logarithm of the mid-point value to create a new variable, lnsalary. As can be seen from Table 2, women propose significantly lower desired salaries. It is worth mentioning that women spend 0.13 more years in education, have an equal proportion graduating from a "985" university and a higher proportion graduating from a "211" university, and hold on average more certificates than men (The Chinese government ranks domestic universities and classifies them as "985 Project Universities" and "211 Project Universities". As of 2018, there are 39 universities listed in "985 Project Universities", as the first-tier, and 112 universities listed in "211 Project Universities", as the second-tier). Overall, it means women even have a slight advantage in education. Women are 2.06 years younger than men. This is mainly due to the fact that women may quit the labor market earlier because they more often take care of the family and also enjoy a lower legal retirement age (In China, females do most, if not all, of housework., and currently the legal retirement age for female is 5 years younger than for male). This can be confirmed by the observation that female job seekers who have already left their job (variable: leave in Table 2), the group that might permanently quit labor markets, occupy a larger proportion than men. It is consistent with the observation that women's working experience is 2.19 years less than men. As a result, their proportions of working at different types of enterprises, namely state-owned, listed, foreign, and private firms, are significantly lower than men's.

B. Individual characteristics
As to the marriage status, it is shown that the proportion of married women is 24%, significantly lower than that of married men, which is 34%. This may have two reasons. First, married women prefer stability over job changes [15]. This can be confirmed from the observation of work status: though female job seekers as a fresh graduate (variable: graduate) occupy a larger proportion than men, females who are still working (variable: working) or even satisfied with current work (variable: satisfied) are less likely to seek jobs elsewhere than men. The second reason for female job seekers' lower proportion of being married is that they choose not to disclose the marital status to avoid gender discrimination based on birth and parenting pressure, since it is optional to fill in marital status in our resume data.

Self-view
Earlier studies on gender differences in self-view mostly use college student sample. However, the conclusion on gender differences in self-view may be a peculiarity of student samples and may not extend to other groups. Similar argument is documented by [16] on gender differences in altruistic behavior. Besides, even people of the same age group may have quite different levels of self-esteem depending on their personal attributes and the working class they are from. Our sample of working-age white-collar women may exhibit a higher level of self-view than general women at large. As is documented by [17], women of different attributes (masculine or feminine) are observed to show significantly different levels of altruism. In addition, gender differences in self-view may not be extended from lab observations to the actual labor market. Literature abounds in emphasizing the situation setting. For example, gender differences in moral judgement and honesty vary in different situations [18][19].

A. Text quantification
Our resume data could help address the above issues. However, self-views in the resume data are textual descriptions. How to quantify those descriptions is a key in this study. The way we tackle this problem is by adopting "Jieba", a Python Chinese word segmentation module. Then we extract the high frequency words from the pool of all self-view texts, and finally record if a high-frequency word occurs in each resume. In detail, first we use "Jieba" to segment all the words in the pooled self-view text data and sort the words according to their frequencies. We exclude general words like "I", "work", "firm", preposition and conjunction words like "and", "in", "of", and punctuation. The frequency ranking of the remaining words is shown in Table 3. Specifically, we classify the words into two groups: self-esteem and self-efficacy as adopted in psychology literature [20]. Self-esteem is a judgment of personal value [20]. Global self-esteem is one of the most widely researched psychological constructs. Previous research suggests that self-esteem can interact with gender to influence the image people have of themselves receiving pay in the future [9]. Self-efficacy is an appraisal of one's competence [21], and is also linked to gender and pay-related variables. Self-esteem and self-efficacy are distinct yet related constructs. In fact, the concept of self-esteem and self-efficacy as two mainly adopted concepts of self-view in psychology literature [9,22], which guarantees their validity as measures of self-view. Our measures of self-esteem and self-efficacy follow the method of lexical tradition in psychology literature. There are two prominent systems for measuring personal trait, one derived from the lexical tradition and one from the questionnaire tradition, namely natural language adjectives and theoretically based personality questionnaires [23]. The questionnaires method may show redundancy of questions and little resemblance to each other due to remarkably diversification of underlying theories and hopelessness to identify basic dimensions. In this sense, the advantage of natural language approach is that it could identify a few hundred adjectives with some confidence of their representation of ordinary social language.
Finally, we create a dummy variable for each key word to indicate whether the word occurs in a resume. Furthermore, we add up the occurrences of the words in each of the two categories as variables self-esteem and self-efficacy. Besides, the length of self-view text is recorded as variable description-length, to measure the general richness of the content. As we can see from Table 4, women score higher in both self-esteem and self-efficacy but write a shorter self-view. We further check self-view across birth cohorts. To ensure the sample size of each cohort is larger than 100, we only select the subpopulation of birth year 1972 to 1997. Fig 1 shows that women of almost all cohorts score higher in both self-esteem and self-efficacy than men.
Although a large body of previous literature find that girls have distressingly lower selfesteem, some other meta-analysis find that the effect size for the gender difference in selfesteem is small or insignificant, and the effect size also depends on life stage. Besides, previous research finds a clear trajectory of systematic changes in self-esteem throughout the life-span age [24][25]. Moreover, it is not surprising that the disadvantaged groups have levels of selfview not lower than the advantaged groups. It is documented that Blacks and Chicanos have levels of self-esteem at least as high as those of Whites in many studies [20,26,27].
Our finding that women score higher in "global" self-view may also be related to the theories of "belief flipping" or disappearance of statistical discrimination against women who succeed getting into career track positions [28][29]. In line with the theories of "belief flipping", our finding of differences in self-view favoring females makes sense if we note that our sample is from adults active in the labor market with on average 15.2 years of education and 8.99 years of work experience. These women are those who succeed getting into career track positions, and indeed those who climb to the top class, considering that the majority of population receive no more than junior high school education in a developing country like China. It is possible that women show higher self-view than men in this sample, due to self-selection. To shed light on this, we will do sensitivity analysis based on years of work-experience and along other dimension in part C of this section.

B. Domain-specific self-view
In an effort to further reconcile the differences of our findings with the existing literature, we examine self-view across all 24 key words in Table 5. This is related to the "domain-specific" self-view literature. While most previous works examine "global" self-esteem, some other works explore into "domain-specific" self-esteem. For example, [30] find that although men score significantly higher than women on physical appearance, athletic, personal self, and selfsatisfaction self-esteem, women score significantly higher than men in behavioral conduct and moral-ethical self-esteem, and no significant gender differences appear in academic, social acceptance, and family self-esteem. This finding can be extended to self-efficacy as well [31][32]. Our examination of self-view across 24 key words supports the conclusion from "domainspecific" self-view literature. As can be seen in Table 5, we find that men score significantly higher than women in teamwork, and problem-solving self-efficacy, women score significantly higher than men in moral-ethical, and social acceptance self-esteem, and no significant gender differences appear in independency self-esteem, and organizing ability self-efficacy. In summary, although women score higher than men in "global" self-view, men score higher than women in some self-view "domains", which is consistent with literature. There is little literature on self-view that uses other measures than self-esteem and self-efficacy. One exception is [23] that use five factors: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience. However, for the purpose of convincingness of self-view measures, we believe our 24-keyword method exhibits a more detailed picture of self-view than five-factor approach. Moreover, it turns out the 24-keyword method supports the results from global self-esteem and self-efficacy, and it helps reconcile the differences with the existing literature.

C. Self-image or social signaling?
Even if it is possible that the disadvantaged group have levels of self-view at least as high as those of the advantaged group, and that our white-collar female sample fits theories of "beliefflipping", one might still doubt if the result is a story of social signaling instead of a story of self-image. One might also argue that women perhaps spend more time preparing their resume. To this end, we calculate gender differences in self-view across different groups, based on whether they are fresh graduates, whether they are in female-dominated industries, and across groups of different marital status, and length of work experience. Table 6 shows the results of gender differences for fresh graduates and workers with some work experience. It is clear that for fresh graduates, no significant gender differences exist in either self-esteem or self-efficacy. In comparison, significant difference in self-view favoring females exist in workers with some work experience. Table 7 further differentiates workers according to their previous job numbers. The results support those of Table 5. The more work experience, the larger the gender differences in self-view favoring females. Tables 6 and 7 together support the view that there is systematic change in self-view throughout career age [24][25], and the theories of "belief flipping" or disappearance of statistical discrimination against women who succeed getting into career track positions [28][29]. In fact, we observe that women evaluate selves more highly relative to men do when they gain more and more work experience.
One might also argue that women spend more time in preparing their resume. It is true that significant and pervasive levels of discrimination have been found against women in the labor markets [33][34]. However, there are studies that find a pro-female bias in callbacks in female-dominated or mixed occupations [35][36]. To reconcile conflicting findings, we compare self-view levels across industries with different levels of female domination. If women do spend more time in preparing their resume due to discrimination, we would see smaller effect size of favoring-female gender differences in self-view in more female-dominated occupations. Table 8 shows the results of gender differences for workers from different industries, with IT and real estate/ construction being traditional male-dominated industries, and finance and education being more female-dominated or mixed industries. The results show that no differences exist across industries in terms of gender differences. Therefore, we could say that it seems not true that women spend more time preparing their resume.
Finally, Table 9 shows the results of gender differences for married and unmarried subsamples. Note that our resume data from zhaopin.com can only be seen by potential employers, and is not available to the public. Therefore, no comprise is needed to be made for the marriage market signaling. The results that married subsample showing larger gender differences is consistent with the results from Tables 6 and 7. Married population are usually elder and have In summary, our resume data from zhaopin.com can only be seen by potential employers, and is not available to the public, unlike LinkedIn. Therefore, social concerns, such as socialimage or social signaling, is not the underlying mechanism in this study. On the contrary, we argue that the results reflect females' true self-image. Besides, previous findings of Blacks and Chicanos having levels of self-esteem at least as high as those of Whites lend credibility to our finding that females can have higher self-view than men, at least in some domains. Moreover, we argue that the differences in self-view favoring females may be related to the theories of "belief flipping", since women in our data sample are those who succeed getting into career track positions, and indeed those who climb to the top class given their much longer average years of education compared to that of national average in China.

Conditional association of self-view with desired salary
In this section, we first perform baseline regression of desired salaries on gender indicators, using as controls human capital, marital status, enterprise type, living costs, and industry fixed effects to account for different job seeking procedures across industries. Then we explore the role of self-view in explaining desired salaries and gender differences. Finally, we use all keyword components of self-view to gain more insights on the structure of self-view itself and its connection to desired salaries.  Table 10 shows the results of OLS regression. The dummy variable of gender indicator, female, is significantly negative, meaning women propose lower desired salary than men. The first column is the baseline regression according to the standard Mincer equation [37] where education, work experience and its square term are included. After adding more control variables in column (2) and (3), we get larger adjusted R 2 , meaning these controls add explanatory power to the gender differences in desired salary. Specifically, column (2) shows that enterprise type, marital status, work status, and industry help explain the gender difference. Column (3) adds in hukou locality and the average housing price of a job seeker's desired work city, to reflect local living costs. These two variables are not seen in previous literature, but well improve explanatory power in the sense that adjusted R-square increases substantially. Overall, column (1)-(3) shows that after controlling for variables like human capital, marital status, enterprise type, industries, living costs, women propose about 20% lower desired salary than men at 1% level of significance. We further perform Ordered Probit regression as a robustness check. We use the range category of desired salary as the dependent variable, since range category is the original data from the drop-down menu. The results, not reported here, support that women propose significantly lower desired salary than men, with different inclusions of controls.

B. Mediator and moderator effects of self-view
Results reported in Table 11 use as dependent variables three dimensions of the job seekers' self-view. Descriptive statistics of these variables were provided in Table 4. The results of Table 11 show that the coefficient of the dummy variable female is significantly different from zero in each regression. Thus, self-view can be a mediator to explain the gender differences in desired salary. Column (4) in Table 10 is to check the mediator effect of self-view on desired salary. As we can see, allowing for self-view variables in column (4) reduces the gender gap in column (3) (as captured by the coefficient of the female dummy) by about 0.02 log points, or 10% in relative terms. Besides, the coefficients of self-view variables are quite significant themselves, serving as good contributors. A one-point increase in the self-esteem decreases desired salary by 6.5%, with a decreasing marginal effect. A one-point increase in self-efficacy increases desired salary by 1.9%. For every 50-word increase in description length, desired salary increases by 4.4%.
To check the moderator effects, we add interaction terms of gender indicator and self-view. The results are reported in column (5) of Table 10. Self-efficacy and description-length do not moderate the relationship between gender and desired salary. Although there is a statistically significant coefficient for the interaction of gender indicator and self-esteem, the magnitude is too small to affect the slope of the relationship.

C. Components of self-view
We further use each key word component as one dimension of self-view, to explore the inner relationships of the components and to check the robustness of previous results. S1 Table   Table 11. Regression results on self-view. shows the correlation matrix of 24 key word components. Mostly they are weakly and positively correlated. The positive largest correlation coefficient is 0.46, and the largest negative correlation coefficient is -0.06. Other than grouping into two categories of self-esteem and self-efficacy, a natural way to deal with the 24 dimensions is to apply principal component analysis (PCA). Table 12 presents the results. The first 7 components have eigenvalues larger than 1. Moreover, 17 components are needed to meet the requirement of explaining 80% of the variance. Therefore, roughly speaking, PCA is not very suitable. To formally measure how suited the data is for component analysis, we apply the Kaiser-Meyer-Olkin (KMO) test. The KMO statistic gauges the proportion of variance among variables that might be common variance. In other words, it measures sampling adequacy. The KMO statistic is 0.76, indicating a just middling sample adequacy. Therefore, we do not use PCA when analyzing the components of self-view. Instead, we directly use each component as a regressor. Table 13 reports the results of using all key word components of self-view as regressors. The first column has the same regressors as in column (3) of Table 5 except that the variables selfesteem, self-efficacy, description-length are replaced by the 24 key word components of selfview. As can be seen, 21 out of 24 variables have non-zero coefficients with significance levels at 1% or 5%, meaning almost every component has its own merit in explaining the desired salary. The second column includes interaction terms of these components and gender indicator. (The coefficients of the interactions are not reported to avoid the table being too lengthy.) The coefficients for the key words are generally unchanged, where 19 out of the 24 interaction Gender differences in self-view and desired salaries terms are not different from zero at 10% significance level. This is consistent with the results for interaction terms when self-view was measured by self-esteem, self-efficacy and description length in Table 10, indicating no moderator effects. The third and fourth columns are for the female and male samples respectively. The coefficients are not much different from that of the first column, meaning the results from the first column are quite robust.

Conclusion
Desired salaries have the potential to crucially determine labor market outcomes, and gender differences in desired salaries can be an important cause of existing gender differences in labor market outcomes. When women propose a lower desired salary, they will be offered a lower salary: the process is self-fulfilling [38][39]. And a lower ongoing salary for females further undermines females' salary expectation, which negatively impacts lifelong earnings [40]. One major challenge to better understand desired salaries is that they are difficult to observe in their natural environment. By using a large database of job seeking resumes in actual labor markets, we are able to explore the magnitude of and the determinants of gender differences in desired salaries. We find gender differences in global self-view favoring females, and in some domain-specific self-view favoring males. In addition, we find that women do propose lower desired salary than men, after controlling for various factors such as human capital, marital status, industry fixed effects, etc. We further investigate the role of self-view and find it contributes to explain desired salaries, with modest mediator effect but little moderator effect on gender differences in desired salaries.
Our work on self-view, by using lexical approach, is somewhat limited in examining in full detail about potential mechanism underlying the phenomena. However, to reconcile some differences with the existing literature, we perform the gender differences of self-view across different groups, based on whether they are fresh graduates, whether they are in femaledominated industries, and across groups of different marital status, and length of work experience. We find that the gender differences reflect self-image, but not social signaling. In fact, our results show a very clear trajectory of increase in self-view of women relative to that of men throughout career span. Women score higher than men in self-view, and more so when they are married. And there's not much difference in gender gap of self-view between the subsample from female-dominated industries and the subsample from non-female-dominated industries, meaning women polishing their resume is not the story here. Besides, previous findings of Blacks and Chicanos having levels of self-esteem at least as high as those of Whites lend credibility to our finding that females can have higher self-view than men, at least in some domains. Moreover, we argue that the differences in self-view favoring females may be related to the theories of "belief flipping", since women in our data sample are those who succeed Note: inclusion of controls is the same as in column (3) of Table 10. Standard errors in parentheses. ��� , �� , � denote significance levels of 1%, 5% and 10%. https://doi.org/10.1371/journal.pone.0210072.t013 Gender differences in self-view and desired salaries getting into career track positions, and indeed those who climb to the top class given their much longer average years of education compared to that of national average in China.
Our study, to our knowledge, is the first attempt to use data in actual labor markets to investigate gender differences in desired salaries, and also the first to use text data mining techniques in analyzing self-view in labor markets. The evidence on self-view in this study differs from previous lab evidence, adding to literature that emphasizes the importance of environmental setting in studying subjective views and behaviors in labor markets.
Supporting information S1