How leaders are persuaded: An elaboration likelihood model of voice endorsement

Organizations need both employee voice and managerial endorsement to ensure high-quality decision-making and achieve organizational effectiveness. However, a preponderance of voice research focuses on employee voice with little attention paid to voice endorsement. Building on the social persuasion theory of the elaboration likelihood model, we systematically examine the sender and receiver determinants of voice endorsement and how the interplay of those determinants affects voice endorsement. By empirically analyzing 168 paired samples, we find that issue-relevant information, i.e., voicer credibility, has a positive effect on voice endorsement and matters most when leaders have high felt obligation. The results also show that the peripheral cue used in the study, i.e., positive mood, has a positive effect on voice endorsement and matters most when leaders have low felt obligation or low cognitive flexibility. We discuss the contributions of these findings and highlight limitations and directions for future research.

save face for others [47]. Meanwhile, individuals in high-context cultures rely more on shared understandings (contexts) to convey information, so they tend to be more comfortable with ambiguous messages and less emphasis on the content of the information being exchanged [47]. Thus, compared to respondents from low-context cultures, the leader respondents in the present study may be more sensitive to employee mood and less sensitive to employee credibility, resulting in the impact of positive mood being magnified and the impact of voicer credibility being reduced. Therefore, future research can be conducted in low-context cultures to examine whether our findings remain valid or introduce the low-/high-context cultures as a moderating variable to explore its impact on voice endorsement.

Responses
We thank the reviewer very much. These suggestions above have helped us to improve a lot in terms of wording, error correction, and theoretical interpretation.

Responses
Thank you for raising this question. To figure out the appropriate sample size needed for linear regression, we read a few articles and found that scholars have different views on the rule of thumb for sample sizes. For example, Gefen, Straub, and Boudreau (2000, p9) suggested that linear regression supports smaller sample sizes and that the minimal sample size required is at least 30. According to Burmeister and Aitken (2012, p8), the sample size for regression analysis can use the 20:1 rule, which states that the sample size ratio to the number of parameters in a regression model should be at least 20 to 1. If we adopt this rule, our study's sample size would be 60 [  x 20 = 60]. Similarly, Green (1991, p504) proposed an alternative sample size calculation method for multiple regression as N > 50 + 8p, where p is the number of predictors. Our study has four predictors, indicating a sample of 88 needed for the linear regression.
Besides, we reviewed some articles in top journals and found that the sample size requirement for regression analysis is not very demanding. For example, in Ward et al.'s (2016, p8) study, they collected 131 paired samples (131 subordinates and 131 managers) to run OLS regression analyses. Similarly, in Li et al.'s (2019, p877) study, they conducted a series of linear regressions using a sample of 198 managers. As another example, the sample size Liu et. al (2015, p981) used in their OLS regression analysis was 142. Based on the collected information, we believe that a sample size of 168 could support our regression analysis for the present study.
The sample's representativeness is usually influenced by three factors: randomness, sample size, and sampling approach. When designing the study, we considered collecting questionnaires from companies with different employee sizes, different products, and different stages of development. We also tried to seek the variability of questionnaire respondents (e.g., age, gender, and years of work experience) to increase the sample's representativeness. For the sample size, the larger the sample size is, the more representative of the overall population. Although we discussed above that a sample of 168 is sufficient for regression analysis, we agree with the reviewer that larger sample size could increase our sample's representativeness.
We acknowledge some imperfections in the representativeness of our sample, but we find that the results of our hypothesis remain significant even after controlling for the seven control variables. Therefore, we have reason to believe that such imperfections in our sample's representativeness do not have a significant impact on our estimation results. The specific revisions are listed below or can be found on Page 9.

Specific revisions
Based on previous studies [31,32], a sample of 168 is sufficient to perform the regression analysis.

Comment 2:
You will also need to inform about the data collection process in the study? How long? How did you get those data?
Responses: This is good advice. We have built on our previous work to provide a more detailed description of the data collection process, particularly in terms of the time schedule and how we collected the questionnaires. The specific revisions are listed below or can be found on Page 8-9.

Specific revisions
From April to June 2019, we contacted 200 team leaders (or department managers) working for 73 firms in Zhejiang, China. Survey data were collected mostly from cities of Wenzhou, Yiwu, and Taizhou, where the manufacturing industry was more developed (the data used in the present study were part of a broader data collection effort). We used a paired-questionnaire survey design. During the specific process of data collection, we usually invited all participants (4 to 6 participants per company in general) from the company to a nearby conference room, explained how to fill out the questionnaire with explicit instructions, and promised the confidentiality of all individual responses to reduce their worries about information leakage. For each matched pair of leader and employee, we would distribute two different envelopes, one containing the leader version of the questionnaire and the other containing the employee version of the questionnaire.
We then gave all participants adequate time to complete their questionnaires, which they put into the sealed envelopes thereafter. Each employee completed items related to his/her own voicer credibility, positive mood, and demographic information. Each employee's supervisor provided evaluations of his or her voice endorsement, felt obligation, cognitive flexibility, and demographic information. In total, we sent out 400 questionnaires (200 for leaders and 200 for employees) and received responses from 173 leaders (86.5% response rate) and 182 employees (91.0% response rate)). After the deletion of invalid or unmatched questionnaires, we finally obtained 168 dyads. Based on previous studies [30,31], a sample of 168 is sufficient to perform the regression analysis of the present study.

Responses:
That is an excellent question that we had not noticed before. In our previous study, we simply borrowed or adapted scales developed by other domain researchers without considering the possible impact of the option settings on statistical results. We really learned a lot by reading the study of Dhar and Simonson (2003) referred by the reviewer.
After careful reading and discussion, we believe that the main findings of Dhar and Simonson's (2003) work support us to use an odd number of Likert Scale. According to them, the no-choice option is usually a better choice because it helps resolve the preference uncertainty and discomfort associated with forced choice in competition with other conflict resolutions (P148), while forced-choice procedures may often produce biased or incomplete findings that lead to incorrect conclusions (P156). Other related statements from this recommended study are list below.

P158:
The results also apply to other choice domains. For example, polls of voting intentions are sometimes conducted sing a design in which voters are given the "no opinion/don't know" option, whereas in other polls they are forced to express a preference among the available candidates (Krosnick 2000). The current results suggest that the latter forecasts may often be systematically biased.
Our results also have implications for choice-based conjoint analysis. For example, the Sawtooth software allows a researcher to include a "choose none" option when conducting a conjoint study. Our results suggest that including such an option can make the choice task more realistic and the experience more pleasant for the respondent.
In our study, the odd-numbered Likert scale we used was a type of no-choice option settings in which we provide the choice of "neither agree nor disagree/uncertainly." Our participants did not necessarily have to choose between disagree and agree (1 = strongly disagree, 2 = disagree, 3 = neither agree nor disagree/uncertainly, 4 = agree, 5 = strongly agree). In addition, using an odd-numbered Likert scale is a relatively common practice in psychological research. For example, Lam and Lee (2019, P652) measured their research constructs (e.g., voicer credibility) on a five-point scale ranging from 1 (strongly disagree) to 5 (strongly agree). Liang et.al (2012) used scales ranging from 1 (strongly disagree) to 5 (strongly agree) for all their variables (e.g., employee voice).
Overall, we thank the reviewer for offering such a good reference. By studying it, we have thought more about the theoretical aspects of the scale items' setting.

Comment 4: Is there any practical contributions that can be presented from this study? What are the benefits for the businesses and practitioners of your work?
Responses: We totally agree with this comment that we should provide practical contributions to our study. We have added the practical implications of our study from employee and manager perspectives in the part of Theoretical and Practical Contributions. The specific revisions are listed below or can be found on Page 26-27.

Specific revisions
Our findings also offer valuable practical implications for both employees and managers. From the employee perspective, it will be essential to understand how they present themselves will affect their leaders' cognitive or affective processes of voice endorsement. Building credibility often takes a lengthy period and many work-based interactions, and it is difficult for employees to change their leaders' perception of trust in them within a short time. Therefore, one possible action strategy they can employ is to present positive emotions towards their work, colleagues, and supervisors if they expect their voice to be taken. Meantime, our results also suggested that leaders' motivation and ability are important moderators of the associations between voicer credibility (positive mood) and managerial endorsement, so employees should get to know their leaders before speaking up. For example, if an employee notice that his or her leader is a person who can handle complicated issues but lacks a sense of obligation to the organization, they must be aware that showing positive mood may have little impact on their leader's voice endorsement. The employee should be cognizant of their credibility before speaking up. Especially for those less credible employees, it may be better to build their credibility first. From the manager or organizational perspective, they usually expect their voice endorsement to be objective and fact-based rather than driven by emotions. In this case, the organization should increase managers' sense of responsibility for the organization or their cognitive ability to address complex issues through management efforts, such as carrying out regular communication meetings on organizational goals or theoretical training activities that combine company practices.

Responses
Many thanks for this valuable comment. The reviewer offered a good direction on what factors should be explored in the future research of voice endorsement. We agree with the reviewer that future research could also examine the impact of employees' body gestures, linguistic traits, or selection (combination) of the words on their leaders' voice endorsement. The specific revisions are listed below or can be found in the part of limitations and future research (Page 28).

Specific revisions
In addition, we encourage research to explore other determinants of voice endorsement based on the perspective of ELM. For example, we examined only positive mood as a peripheral cue in our study. However, as another peripheral cue, negative employee mood could also influence leader voice endorsement because it can trigger affective responses such as defense and fear [19]. Besides employee mood, other peripheral cues such as employees' body gestures, linguistic traits, and the combinations (selections) of the words may also influence the voice endorsement of their leaders and is worthy of further investigation.

Reviewer #3
Comment: Quiet an interesting study. The authors have explored a very unique area and the arguments and conclusions are well noted. The limitations and theoretical contributions are well noted. The study has more room for further research on this area and scope.

Responses:
Many thanks for the recognition of our paper.

Responses:
We feel honored that the reviewer was interested in and recognized the theoretical values of our paper. However, we are also truly sorry that our data analysis did not meet the reviewer's expectation. To address the reviewer's concerns, all our authors carefully studied and discussed each of the reviewer's comments and provided detailed explanation and substantive revisions to all questions. We hope that the efforts we have made on the new version of the manuscript can convince the reviewer.

Responses
As the reviewer noted, we were aware of CMB's possible negative impact on our study in advance and took an ex-ante steps (i.e., multiple sources), which we think was the most effective way to reduce the CMB problem. Any other statistical techniques on CMB testing can only check (not reduce) whether the CMB will have a significant impact on the results.
However, we fully agree with the reviewer that the CMB testing with Harman's single factor is simple to perform but was criticized by many researchers. Therefore, we added two other techniques (CFA and ULMC) to further demonstrate that our study's common method bias was not severe. In particular, the ULMC method is superior to the CFA method, and the CFA method is superior to Harman's single factor. We are very grateful to the reviewer for raising this issue, as we have learned two new CMB-check techniques. The specific revisions are listed below or can be found on Page 13-14.

Responses
We are very grateful to this reviewer for this to-the-point comment. As the reviewer stated, we simply followed the "rules of the field" when providing this table without thinking about the value we would add. In our research field of psychology, most researchers would provide such a table "descriptive statistical analysis" without asking why it was done.
We agree with the reviewer that a table containing frequencies and percentages would be more valuable than that containing means and sd because the former can provide more detailed information to reveal the sample attributes such as distribution type, concentration/dispersion trends, and certain outliers. The specific revisions are listed below or can be found on Page 9-10. Table 1 shows some basic information about the sample. As indicated below, most participants who filled out the questionnaires were male. Specifically, the proportion of male employee participants is 64.3%, and that of male leader participants is 74.4%. From the categories of organization tenure, we can find that the majority of employee participants have been working at their companies for less than three years (a proportion of 54.2%), while that of leader participants is for more than three years (a proportion of 72.1%). Regarding the education level, a large proportion of both employee (50.6%) and leader participants (34.5%) are in college, although the latter has a relatively higher percentage of higher education (more than college) than the former.

Comment 3-1:
The authors used CFA and path modeling to explain the causal relationship in their analysis and provide evidence for their hypothesis testing. However, they are not providing a graphical representation of these relationships. They shifted to OLS and hierarchical regression analyses. Such a method is a big assumption from their side that the response variable is continuous.

Responses
First, we would like to clarify that we only used CFA analysis but not path modeling. The two types of analyses can be used separately. For example, Lam, Lee, and Sui (2019) only used CFA to ensure that the multiitem measures they used could be appropriately modeled as distinct constructs. Our purpose using CFA is the same as theirs. Second, we thank the reviewer for this comment which offers an opportunity to explain why we used linear regression rather than the SEM techniques.
In recent years, the growing interest in the SEM technique and recognition of its importance in research suggests the need to compare it with other statistical techniques so that research designs can be selected appropriately. Compared with linear regression, the SEM technique does have several unique advantages in handling some complicated research models, such as models that contain multiple dependent variables. The SEM technique can also provide a holistic analysis, including structural and measurement models, to estimate a series of interrelated dependence relationships simultaneously. However, there are distinct differences between SEM and regression that make each more or less appropriate for certain analysis types. Thus, choosing an analysis method correctly based on the research objectives is crucial. Next, we will explain why regression would be a more appropriate choice than SEM in our study regarding data distribution and the sample size.
(1) Distribution issues. One primary concern when using SEM path modeling is that the hypothetical variables should be normally distributed (multivariate normal distribution). In Holland et al.'s (2017, p717) study, they conducted a literature analysis on 78 papers published in four top journals that includes both moderation and mediation and found that only 5% of these papers were statistically analyzed using any type of SEM. One possible reason is that the product-indicator approaches used by SEM to analyze moderation are flawed. Specifically, the product-indicator approaches use the product of the assessed independent and moderating variables as an indicator of the interaction term. However, this interaction term is non-normal distributed, which violates the data assumptions of SEM. In our study, we have four moderating effects. If these four moderations are included in a path model (SEM) simultaneously, it can cause significant bias in the estimation of the study results.
Furthermore, the regression analysis procedure of moderating effects is relatively simpler than that of SEM. One adds a new variable to the regression model, calculated as the product of the assessed independent variables that are assumed to interact, and then rerun the regression. However, this procedure does not work well in SEM because such a calculated interaction term will have a high shared residual variance with the variables from which it is derived (Gefen et al., 2000).
(2) Sample size issues. Another concern when using SEM is the requirement of a high sample size. In a top management journal, Shah and Goldstein (2006) suggested that the sample size issue is one of the main concerns in selecting SEM and regression analysis, as it would impact the reliability of parameter estimates and statistical power. Moreover, they further indicated that to reach robust results in SEM, the minimal sample size is ten times the number of items in the research model, which is at least a sample of 310 (10 × 31 items) in our study. Obviously, our current sample size is not sufficient nor suitable to run the path analysis of SEM.
Overall, the use of regression in this study can better match the characteristics of the data distribution and the sample size, and the procedures for testing moderation are simpler and more effective. In recent years, research using regression analysis as an empirical approach has been published in many leading academic journals (e.g., Allison et al., 2017, Lam et al., 2019, which also provides some evidence for the validity and legitimacy of the traditional statistical method, i.e., regression. We hope that our explanation of why we use regression (instead of path modeling in SEM) as our preferred method can address the reviewer's concerns.
Many thanks to the reviewer for raising this issue. It is true that, like many previous studies, we did not conduct ex-ante tests of the suitability of whether linear regression could be used. In the new version of the manuscript, we tested the assumptions of linear regression, and the results showed that we could use this method. Specifically, we examined four aspects: multicollinearity, homoskedasticity, normality of the residual, and autocorrelation before running OLS regression.
(1) Multicollinearity. When multicollinearity is observed, the association between the variables leads to larger standard deviations and wide confidence intervals for the results. We can identify whether multicollinearity is severe by looking at the variance inflation factor (VIF) between variables. In our study, the maximum value of VIF is 1.55 (<10), suggesting that the multicollinearity problem does not significantly influence the stability of the parameter estimates (Dielman, 1991).
(2) Homoskedasticity. Homoscedasticity is one of the essential assumptions of linear regression, which refers to the fact that the random error terms in the overall regression function have the same variance. Generally, we can graphically observe whether there is an evident pattern in the distribution of the variance. If there is no obvious pattern in the distribution, the homoscedasticity assumption of linear regression is satisfied, and the regression analysis can be performed. As shown in the figure below, we can see no significant pattern in the distribution of the variance, so it is safe to run a linear regression.
(3) Normality of the residual. The third important assumption of linear regression is that the error term should obey a normal distribution. Otherwise, the confidence intervals for the estimated statistical results can become highly unstable. In our study, we used a normal P-P Plot and histogram to observe the normality of the residual. As shown in the two figures below, the P-P Plot plots fall approximately on a straight line, and the frequency of the regression standardized residual presents an excellent normal distribution. Therefore, it is safe to run OLS linear regression.
(4) Autocorrelation. When autocorrelation occurs, the standard deviation measured tends to be smaller, leading to narrower confidence intervals. According to Durbin-Watson (DW) checks, the closer the DW is to 2, the greater the certainty of determining non-autocorrelation. And if du < DW= < 4 -du, then we can conclude no autocorrelation there. By running DW tests in regression analysis, we found that our DW value is 2.14 and the maximum du is 1.11. Therefore, our DW value is very close to 2, and it is between du and 4 -du. We conclude that there is no autocorrelation in our regression analysis.
Overall, we checked our data's OLS assumptions by testing the multicollinearity, homoskedasticity, normality of the residual, and autocorrelation. Once again, we thank the reviewer for providing us with this learning opportunity, which allowed us to develop a deeper understanding of linear regression. The specific revisions are listed below or can be found on Page 19.

Specific revisions
We employed ordinary least squares (OLS) regression analysis to evaluate the hypotheses. Before running the regression, we checked the linear regression assumptions in terms of multicollinearity, homoskedasticity, normality of the residual, and autocorrelation. The results showed that our data characteristics meet all the assumptions of the linear regression. (1) Multicollinearity. When multicollinearity is observed, the association between the variables leads to larger standard deviations and wide confidence intervals for the results. In our study, the maximum value of variance inflation factor (VIF) is 1.55 (<10), suggesting that the multicollinearity problem does not significantly influence the stability of the parameter estimates. (2) Homoskedasticity. Homoscedasticity is one of the essential assumptions of linear regression, which refers to the fact that the random error terms in the overall regression function have the same variance. In our study, we found that there is no obvious pattern in the distribution of the variance. (3) Normality of the residual. The third important assumption of linear regression is that the error term should obey a normal distribution. Otherwise, the confidence intervals for the estimated statistical results can become highly unstable. In our study, the P-P Plot plots fall approximately on a straight line, and the frequency of the regression standardized residual in the histogram presents a good normal distribution. (4) Autocorrelation. When autocorrelation occurs, the standard deviation measured tends to be smaller, leading to narrower confidence intervals. In our study, the Durbin-Watson (DW) value is 2.14 (close to the expected value of 2). A further calculation showed that it is between du and 4 -du, indicating no autocorrelation in our regression analysis. -

Responses
Many thanks for raising these valuable comments. We have elaborated on the reviewer's concern about the SEM issue on Comment 3-1. Next, we mainly focus on the second point of the reviewer's concerns relating to the response variable.
In our study, the dependent variable is voice endorsement, the independent variables are voice credibility and positive mood, the moderated variables are felt obligation and cognitive flexibility, which can be more clearly identified in the figure below. To further clarify how we used the hierarchical regression analysis to pool the results together, we provided a summary description of how the eight models in Table 4 were developed. The specific response variables in each model are presented below.