Figures
Abstract
Multicollinearity widely exists in empirical studies, which leads to imprecise estimation and even endogeneity when omitted variables are correlated with any regressors. We apply an innovative strategy, different from the usual tools (instrumental variable, ridge regression, and least absolute shrinkage and selection operator), to estimate the robust determinants of income distribution. We transform panel data into (quasi-) cross-sectional data by removing country and time effects from the data so that all variables become zero mean and orthogonal to the country dummies and time variable, and multicollinearity becomes very low or even disappears with the quasi-cross sectional data in any specifications regardless of country dummies and time variable being included or not. Our contribution is threefold. First, we build a general method to address the multicollinearity issue in panel data, which is to isolate the common contents of correlated variables and ensures robust estimates in different specifications (dynamic or static specifications) and estimators (within- or between-effects estimators). Second, we find no evidence for the Kuznets hypothesis within and across countries; investment is economically and statistically the most robust determinant of income inequality; meanwhile, labor income share shows robustly and consistently positive effects on income inequality, which challenges the related literature. Last, simulations with our estimates show that the total marginal effects of development (regarding GDP, capital stock and investment) on income inequality are very likely to be positive within and between countries except that the impacts on middle-60% and top-quintile income shares are not so likely to increase income inequality across countries.
Citation: Shao LF (2021) Robust determinants of income distribution across and within countries. PLoS ONE 16(7): e0253291. https://doi.org/10.1371/journal.pone.0253291
Editor: Dao-Zhi Zeng, Tohoku University, JAPAN
Received: January 12, 2021; Accepted: June 2, 2021; Published: July 1, 2021
Copyright: © 2021 Liang Frank Shao. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
I Introduction
Multicollinearity often exists among the variables with panel data and those in structured function forms, and it mainly results in two issues in empirical studies. One is imprecision of estimation on correlated variables; the other is endogeneity when omitted variables are relevant and correlated with those concerned. Instrumental variables are often used to address endogeneity issues, and the validity of an instrument relies on its correlation strength with the instrumented variables and its noncorrelation strength with the error terms, which may not be perfectly confirmed when using panel data because the estimated residuals are also panel data that are often highly correlated with covariates. The other strategy is either to shrink large coefficients (ridge regression) or select the most related covariates (least absolute shrinkage and selection operator, LASSO), which are also not good solutions because either multicollinearity is not properly removed or relevant regressors are not considered in the regressions.
Panel data has been intensively and directly used in the empirical studies [1–4] of income distribution. But multicollinearity has often been ignored in the literature so that omitted variable bias and endogeneity have been the main obstacles that are hardly well treated, and robust determinants of income distribution are still an open question in the literature.
In this paper, we transfer panel data into quasi-cross-sectional data by extracting country-fixed effects and time (fixed or trended) effects from panel data, which is similar to removing the transportation channels (country-fixed effects and time effects) of multicorrelation among explanatory variables. As a result, all variables that include GDP terms are almost, if not perfectly, isolated from each other, and multicollinearity is dramatically reduced or even completely removed. With the quasi-cross-sectional data, the risks of missing variable bias and the endogeneity caused by omitted covariates are effectively reduced or even perfectly removed.
The rationality of this method [3, 5] is that each variable with panel data has been partitioned into two orthogonal parts, one is for the variable itself, the other is for the time and section effects, which still reserve the full information of the panel data when both parts are considered in a regression. The advantage is that we still consider all covariates, among which multicollinearity has been effectively reduced so that estimates become more robust and precise comparing the existing tools.
We take two steps to obtain the quasi-cross-sectional data. First, we decompose GDP into two orthogonal proportions: explained GDP and unexplained GDP. Explained GDP is the fitted value from the ordinary least squares (OLS) regression of GDP on all known variables, including inequality variables, country dummies, and time variables. The OLS regression residuals are defined as unexplained GDP, which is orthogonal to the predicted (explained) GDP and all explanatory variables, and it will be used to explain income distribution as well. Second, we run a simple OLS regression on country dummies and time (dummy or trend) for each variable of the panel data and save the residuals for the regressions, which form the quasi-cross-sectional data for the regressed variables.
All variables including explained GDP in the quasi-cross-sectional data set have zero mean and are orthogonal to unexplained GDP, and bilateral correlations between any explanatory variables become very low or even near zero. We consider both fixed and trend effects for the time variable, and we run static and dynamic specifications and end up with choosing the one that gives the most robust and consistent results in the estimations.
We consider seven statistical measures of income distribution: the Gini coefficient (Gini), the bottom (B20) and top (T20) quintiles of income share, the middle 60% income share (MID), the income share below the median income (MES), the income share below the mean income (MIS), and the mean population share (MPS), that is, for the individuals whose income is below the mean income. MPS and MIS are used in Shao and Krause [6] to discuss if rising mean incomes had favored the middle income earners in relative terms.
We consider the time variable in data transfer and the estimations as either fixed or trended and use a simple least squares dummy variable (LSDV) and pooled OLS (POLS) regressions for the six measurements on the quasi-cross-sectional data.
We find three sets of variables that robustly and consistently impact income inequality differently. The first includes variables that negatively impact inequality (employment, primary education, etc.); the second includes variables that positively enhance inequality (investment, labor income share, etc.); and the third includes civil liberty and openness, whose roles rely on the development level of a country.
Our contribution is threefold. First, we build a general method to address the endogeneity issues caused by omitted variables being correlated with covariates when using panel data, which ensures robust results in different specifications and estimators (dynamic or static specifications and within- or between-effects estimators). Second, we update the literature on robust determinants of income distribution within and/or across countries; no evidence is found for the Kuznets hypothesis, labor income share (openness) is positively (negatively) associated with income inequality, and investment, tertiary education and the unexplained GDP are the three most robust determinants of increasing income inequality, and the most robust determinants of decreasing income inequality are employment, explained GDP, working hours, interactive term of openness with GDP, and primary education. Last, simulations with our estimates show that the total marginal effects of GDP, capital stock and investment on income inequality are very likely to be positive.
The paper proceeds as follows. Section II briefly reviews the literature. Section III discusses the data and shows how bilateral correlation can be effectively reduced by transferring panel data into quasi-cross-sectional data. Section IV presents a replication of three typical studies. Section V provides our unbiased, consistent and robust results, and Section VI concludes the paper.
II Literature review
2.1 The Kuznets hypothesis
Since Kuznets [7] proposed the famous hypothesis that income inequality shows an inverted U-shape in GDP per capita, this hypothesis has been debated in theoretical and empirical studies [8–12].
Some studies [1, 13–17] find strong evidence for the hypothesis using panel data. Huang [18] presents evidence for the hypothesis using a reduced function form and a cross-country dataset. These estimations suffer serious endogeneity issues caused by omitting correlated-covariates and imprecise estimates.
There is also empirical evidence [2, 19] challenging the hypothesis. Savvides and Stengos [20] support the hypothesis with the quadratic form of GDP using threshold regression and pooled OLS (POLS) estimation on panel data, but the results are not robust when country dummies are considered. Frazer [21] finds various inequality-development relationships depending on the choice of years and countries, but country- and time-fixed effects are not included in his estimation.
Kalliovirta and Malinen [22] show that USA inequality drives inequality in other developed countries; which shows evidence for the structural correlation of panel data and the time correlation among all variables in the panel data. Therefore, all of the above studies based on panel data face serious endogeneity issues caused by missing correlated covariates.
2.2 Openness
Openness to international trade has been argued to be a major determinant of income inequality [17, 23, 24], but the empirical evidence is mixed. Some authors report no significant effect of openness on inequality [2, 25], while others find a positive effect, which is stronger for poorer countries [1, 24, 26, 27]. A negative correlation between openness and the labor income share is found in Harrison (2005) [28] and Ortega and Rodriguez (2006) [29]; Higgins and Willianson (2002) [15] find only limited support for openness increasing inequality.
Milanovic (2005) [24] uses the data of Deininger and Squire (1997) [30] and finds that the interactive effect of openness and mean income is positive on the income shares of the poor and the middle class but negative on the income share of the rich in his estimations. However, Milanovic’s estimations also suffer serious missing-variable bias because openness is highly correlated with capital stock, investment, financial development, etc., which are not considered in his regressions.
Jaumotte et al. (2013) [31] show that technological progress has more impacts on rising income inequality than globalization does, and the limited overall impacts of globalization are reflected by two offsetting forces: negative effects of trade and positive effects of financial globalization on inequality. We let explained GDP and unexplained GDP denote technological progress. Openness and financial development are also considered in our estimation. The largest bilateral correlation among all these variables in quasi-cross-sectional data is only -0.177, which is between openness and unexplained GDP.
It has also been noted [32, 33] that openness is negatively related to labor income share using the latest 5-year average panel data, which is in line with our finding.
2.3 Education
Different levels of educational attainment, primary, secondary and tertiary, may have various effects on income distribution; empirical studies using single average educational attainment (Thomas et al., 2000; Checchi and Garcı´a-Pen˜alosa, 2010) [34, 35] cannot reveal the difference between these effects. Castello-Climent and Domenech (2017) [36] find that the wage Gini has an inverted U-shape in terms of the human capital Gini, which might imply inconsistent effects of various levels of educational attainment on income inequality.
Li et al. (1998) [2] point out that initial secondary schooling is an important determinant of inequality, which is not supported by our estimations; we find that all three levels of educational attainment do have robust and significant effects on income distribution, which are also reported in Barro (2000) [1].
Eicher and Garcia-Penalosa (2001) [37] analyze the non-monotonic relation between educational attainment and inequality. Recent studies (Erosa, et al., 2010; Santos, Sequeira and Ferreira-Lopes, 2017) [38, 39] show that income inequality is significantly affected by human capital accumulation and TFP. We use three levels of educational attainment to denote human capital because tertiary education could play different roles than primary and secondary educations in inequality. We do not take TFP as an explanatory variable to avoid its high correlation with productive factors.
2.4 Labor income share
Labor income share is correlated with many factors, for instance [40, 41]. Many studies (Daudey and García-Peñalosa, 2007; Checchi and García-Peñalosa, 2010; Bengtsson and Waldenström, 2018) [35, 42, 43] have documented the strongly positive (negative) correlation between capital (labor) income share and top personal income shares (the Gini coefficient) and global decline of labor share has also been noted [44]; which contradict our results that labor income share is positively associated with income inequality within and across countries.
The effect of capital income share on income distribution is explained by the effects of capital stock and investment in our estimation. The effect of capital stock is negative and economically small, and the effect of investment is positive and economically large on income inequality. Any changes in labor income share might be dominated by changes in the earnings of top income earners because labor income share is positively (negatively) correlated with the top (bottom) 20% income share in our regressions, and the bottom income shares (B20, MED, MIS) are relatively stable in the data.
2.5 Other factors
Some studies (Schultz, 1998; Li, Squire and Zou, 1998; Barro, 2000) [1, 2, 45] demonstrate that income inequality is explained mainly by country variations and very little by time variation, which implies that the dynamic LSDV specification might be an appropriate choice to explain within-country variations, where the country variations are explained by the country dummies and the lagged inequality.
We consider many other factors discussed in the literature, which include civil liberty, initial secondary schooling, financial depth, initial land inequality (Li, Squire and Zou, 1998) [2], democracy indices (Rodrik, 1999) [46–48], economic freedom (Carter, 2006) [49], population growth (Deaton and Paxson, 1997) [50], inflation (Bulir, 2001) [51], etc. and unemployment (Jantti and Jenkins, 2010) [52] Capital stock and investment are also included in our specifications.
III Data
3.1 Data sources and variable definitions
The data of macroeconomic variables (GDP, capital stock, investment, population size, employment rate, import and export, average working hours, labor share, etc.) are retrieved from PWT 9.0. Educational attainment is defined as the average years of primary, secondary, and tertiary schooling among the population above 15 years old, and the data are from Barro and Lee (2013) [53]. Inflation is the yearly percentage change in price using year-average CPI; we retrieve the inflation and financial development indices from the IMF. We obtain civil liberty and freedom dummy data from Freedom House. As the usual strategy to reduce frequent data variations, we use five-year average data. The statistical summary of the data is presented in S1 Table in S1 Appendix.
We collect the income distribution panel data from WIID3.4 of UNU-WIDER (https://www.wider.unu.edu/project/wiid-world-income-inequality-database). The panel data are consistent and comparable because the data are chosen with consistent statistical units, a few of which are estimated by the methodology that Shao (2017) [54] uses; for instance, the income definition is disposable income, the income unit is per household per capita, and the data survey is nationwide and covers all ages in the population. We mainly consider 7 statistical measurements of income distribution: the Gini coefficient, the bottom (B20) and top (T20) quintiles of income share, the income share of the 60% middle incomes (MID, total income share of the three middle quintiles), the income share below the median (MES), the income share below the national mean income (MIS), and the share of the population whose individual income is below the national mean income (MPS. For a more detailed discussion about MPS and MIS, refer to Shao, 2017 [54]).
The variable definitions are as follows (subscripts for country and year are dropped for conciseness):
GDP = log (cgdpo/pop), where cgdpo is output-side real GDP at current PPPs (in 2011 US dollars), and pop is the total population size in a country in a particular year.
HR = log(avh), where avh is the average annual hours worked by persons engaged in a country.
Emp = emp/pop, where emp is the total number of persons engaged in a job during a year in a country.
PG is the population growth rate, and pi is the inflation rate. Frdm is an indicator dummy for freedom, and CL is the civil liberty index; both are from the Freedom House (https://freedomhouse.org/report/freedom-world/freedom-world-2018). FDI is the financial development index from the IMF (http://data.imf.org/?sk=F8032E80-B36C-43B1-AC26-493C5B1CD33B).
Ksh = ck/cgdpo, where ck is capital stock at current PPPs (in 2011 US dollars).
Ish = csh_i, which is the share of gross capital formation in cgdpo at current PPPs.
Gsh = csh_g, which is the share of government consumption in in cgdpo at current PPPs.
Opnsh = csh_x-csh_m, which is the ratio definition for openness.
The level definitions for capital stock, investment, government spending, exports and imports taking log per-capita form are as follows:
cshx(cshm) is the share of exports (imports) in output-side real GDP (cgdpo) at the current domestic price. cshm takes a negative sign in PWT9.0, Open is the log of per capita total trade and Opnsh is the share of total international trade in output-side real GDP. The level definition of these variables is needed to partition GDP into explained and unexplained proportions, and their ratio definitions are needed to estimate the statistical measures of income distribution to avoid high multicollinearity.
We do not use a summary index to measure human capital in a country. However, we include the average years of primary (pry), secondary (sey), and tertiary (tey) education to indicate different levels of human capital because these three levels of education may not affect income distribution in the same fashion. We also use interactive terms to describe the nonlinear effects of inflation, civil liberty, and openness with development (GDP).
3.2 Data summary of the statistical measures of income distribution
A typical property of panel data is that within-variation and between-variation are quite different. Li, Squire and Zou (1998) [2] identify differences between within-variation and between-variation in inequality, but their specifications take no action to identify this difference. The within-country variation (standard deviation) is only approximately one-third as large as the between-country variation for the Gini coefficient, and it is approximately one-half as large as that for the MPS and MIS. These findings are shown in Table 1, which summarizes the data for the Gini coefficient, B20, T20, MES, MID, MPS, MIS, and GDP.
The other variables also show much larger between-variation than within-variation; one exception is that the within-variation of inflation is approximately 3 times greater than the between-variation. The data summary of all other variables is relegated to S1 Table in S1 Appendix. This property of panel data is very informative in forming specifications of inequality variation and choosing estimators. That is, both between- and within-variations of inequality must be properly identified, and the same consideration applies to all other explanatory variables. We take two actions to address this issue. First, we apply the simple dynamic LSDV estimator to the panel data, which ensures good efficiency of the regressions and is said to outperform GMM and system GMM (Moral-Benito, 2013) [4]. Second, we isolate the country and time components from each variable so that the between-effects and within-effects in each variable are isolated and multicollinearity can be dramatically reduced among all explanatory variables.
Table 2 below shows the bilateral correlation among the variables GDP, GDP2, investment (I), employment (Emp), tertiary education (tey), average working hours (HR), Open and Opnsh.
Table 2 shows that the bilateral correlation among the variables is very strong and that the correlation of Open with other variables is much larger than that of Opnsh, and it is similar for the definitions of capital stock, investment and government spending, which is why we prefer the ratio to the log-level definition for these variables in this study. S2 Table in S1 Appendix shows the countries and years for the data of the Gini coefficient.
3.3 Transfer panel data into quasi-cross-sectional data
Bilateral correlation likely leads to high multicollinearity in estimation. Strong multicollinearity may result in biased and inconsistent estimation when relevant and correlated variables are omitted. To reduce the bilateral correlation among the covariates in the panel data, we run OLS regressions for each variable xit on the country dummies and year dummy (or level year for the trended effect); which is the methodology of detrending and deseasonalizing time series introduced in Wooldridge [5]. We save the residuals as the new data of the variable, denoted by nxit, which is the proportion of the variable that is not explained by country and time. The country-fixed and time-effect components of explanatory variables are often the main channels of high multicollinearity; the bilateral correlation among the new data nxit of the variables will be much smaller than that among the original data. When we run regressions with the new data nxit, multicollinearity is dramatically reduced or even removed so that the imprecision and inconsistent issues are effectively reduced or even removed.
We run OLS regressions for GDP on all the new variables, including current and 5-year lagged inequality terms (MPS, MIS, MID, T20, and Gini), country dummies, and a form (fixed or trended) of the time variable. Then, we save the predictions and residuals, which form explained GDP and unexplained GDP, denoted by GDPeit and eit, respectively. To remove the country and year components of GDPeit, we also run OLS regression for it on country dummies and year (dummy or level) again and save the residuals, which are the new GDPit, nGDPit.
The above regression equations are as follows:
(3.3.1)
(3.3.2)
(3.3.3)
(3.3.4)
where the residuals of (3.3.1), nxit, generate the new (quasi-cross-sectional) data of xit; the residuals of (3.3.2),
, form unexplained GDP, and its predicted values,
, are explained GDP; nXit is the vector of all new variables nxit; β and α (
) are (estimated) coefficients; subscript i is a country index; t(year) is a function of time t, either fixed or trended in time; α2 * i.year, or α2 * year, i.year and i.country are year and country dummies, respectively; and the residuals of (3.3.4),
, generate the new data (quasi-cross-sectional) of
. Note that to simplify the analysis, constant trend α2 for all countries is implied for t(year) taking the level form α2 * year.
The vector Xit in (3.3.2) includes the current and 5-year-lagged Gini coefficient, MIS, MPS, MID and T20. Capital stock, investment, exports and imports are in level definitions, but we use the ratio definition later for these variables when we run regressions of the 7 statistical measures of income distribution to avoid high multicollinearity in the regressions of income distribution statistics.
Since the new data (quasi-cross-sectional) are defined by residuals of the OLS regressions (3.3.1), all variables with the new data, including , are centered at zero. Furthermore, any bilateral correlations among nGDPit, nGDPit2, eit, and eit2 are near zero. eit, and eit2 are uncorrelated with all other explanatory variables, and all new variables nxit including
are not correlated with country dummies and time year.
Table 3 below summarizes the statistics of the selected variables corresponding to Table 1, which have been transferred into quasi-cross-sectional data. As observed, the standard deviation of the new variable is much smaller than that of the corresponding original variable because the between-country and overtime variations, which are generally large, have been removed from the original data.
Table 4 below shows the bilateral correlations among the new variables corresponding to those in Table 2. The table shows that the bilateral correlations with the new (quasi-cross sectional) data have become much smaller than those with the original panel data in Table 2; for instance, the correlation coefficient between GDP and GDP2 is 1, but it is -0.007 between nGDP and nGDP2; this is because the panel data to be used in Table 4 have become the quasi-cross sectional data by removing the country and time effects from the original panel data to be used in Table 2, which removes the correlation effects of country and time in the original panel data and often changes the direction of the data as well. Note that what we care about here is the small size of the bilateral correlation coefficient in Table 4 so that the explanatory power (or say multicollinearity) is small too when we look at the regression of one variable on all other variables using the quasi-cross section data.
3.4 An application of the quasi-cross-sectional data on inequality measures
The Gini coefficient is closely related to MPS and MIS. We run OLS regressions of the Gini coefficient on MPS, MIS and their product MDS with trended time effects on the quasi-cross-sectional data. We use three estimators, POLS, fixed effects (FE) and LSDV, with robust errors. The POLS estimator explains the correlation between countries, and the FE and LSDV estimators explain the correlation within countries. For comparison, we also run the same regressions with the original panel data. Table 5 below shows the results. The results with time dummies are similar and relegated to S3 Table in S1 Appendix.
We observe the following results from Table 5 and S3 Table in S1 Appendix:
- The estimates of nMIS and nMPS are robust for the two specifications (one includes nMDS; the other does not) in the three estimators with the quasi-cross-sectional data but not in those with the original panel data.
- The LSDV estimates of linear terms are the same as the two data sets (see the last column on the right). They are robust with the quasi-cross-sectional data (see the last two columns on the right in the upper four rows) but are not robust with the original panel data (see the last two columns on the right in the lower four rows).
- POLS estimates are robust with the quasi-cross-sectional data (see the first two columns on the left in the upper four rows) but not robust with the original panel data (see the first two columns in the lower four rows).
Therefore, the Gini coefficient between countries can be empirically described as 1.25MPS−0.962MIS from the pooled OLS estimation, and it can be approximately noted within countries as 1.331MPS−1.28MIS+0.45 from the LSDV estimation, both of which use quasi-cross-sectional data. Note that FE and LSDV do not have the same estimates because robust errors are used and year is included in LSDV but not in FE.
IV Literature replications
We replicate the estimation of three widely cited papers on this topic, namely, Barro (2000, 2008) [1, 17], Li et al. (1998) [2] and Milanovic (2005) [24], and show how their results are biased and not robust due to multicollinearity of panel data. In the next subsection, we show that LSDV and POLS estimations become robust estimators with the quasi-cross-sectional data.
There are two differences between our data and theirs. One is that we choose a comparable and consistent (disposable income) Gini coefficient, in which we estimate a few observations from the Gini coefficient of other income definitions by assuming constant linear correlation over time between the Gini coefficients of disposable income and other income definitions in a country (Shao, 2017) [54]. In contrast, the other three studies include inequality data with different income definitions and statistical units and use dummy variables to denote these different statistical dimensions. The other difference is in the treatment of data frequency. Barro (2000, 2008) [1, 17] uses 10-year average data, while Li et al. (1998) [2], Milanovic (2005) [24], and this paper use 5-year average data.
4.1 Barro’s (2000) specifications
We replicate the six specifications (Table 6 Part I, page 23, Barro, 2000 [1]) that account for the Gini coefficient (similar results for Part II of FE estimation are not listed here to save space). The time period of the data in Barro (2000) [1] is from 1960 to 1990, and Barro (2009) [17] confirms the results by extending the data to 2000. Table 6 below shows the replication results of the seemingly unrelated regression (SUR) estimation on our panel data.
The replication results show significant and negative effects on quadratic GDP in all regressions. The average turning point (8.24) of the estimated ln(GDP) is smaller than the mean (9.38) of the sample ln(GDP), and the standard deviation is 0.218, which is similar to the result in Barro (2000, 2009) [1, 17], and so are the estimates of educational attainment. The small differences might be caused by the data averages and specifications. Our data set size is much larger, and we do not include dummies for income definitions because we estimate the Gini coefficient to make the data consistent and comparable regarding definitions and statistical issues. The VIF column shows very large values that reveal a high possibility of omitted variable bias. We list the VIF values for only one regression that includes all regressors since the VIF values are generally not much different in other regressions that drop some of the regressors; the same reason applies to the VIF values in other tables.
We argue that Barro’s results are misled by high multicollinearity, and the results are biased and not robust. The multicollinearity in the estimations is caused by both structural terms, for instance, GDP and GDP2, and data issues, as all variables show country-fixed and year-fixed (or trended) effects.
We use the quasi-cross-sectional data obtained from the estimations by Eqs (3.3.1) ~ (3.3.4) to run the same regressions on Barro’s specifications. The results are listed in Table 7 below. The small values in column VIF show that multicollinearity is no longer a concern in the regressions.
Table 7 shows that we do not find evidence for the Kuznets hypothesis across countries. The SUR estimator assumes that the Gini coefficient must be independent across countries. This assumption is rejected by the Breusch-Pagan test after the estimation, which is not reported in the table to save space. Note that the results are still similar if the time variable is assumed to be trended in generating the quasi-cross-sectional data. Therefore, Barro’s estimation results are not robust to quasi-cross-sectional data.
4.2 Li, Squire and Zou’s (1998) specifications
Li, Squire and Zou (1998) [2] (LSZ for short) explain income inequality by a reduced form of initial mean level of secondary education, financial development, civil liberty, and initial land inequality. Table 8 below shows the replication results, which refer to the base and IV regressions in Table 6 and columns 1, 2, 3, and 5 in Table 8 of LSZ. We can see that the estimation results of civil liberty in all regressions and of secondary education and financial development in the regressions of columns OLS and IV(1) are the same as LSZ’s estimates regarding their signs and significance levels. Note that our financial development index (FDI) is a more comprehensive measure than the ratio of M2 to GDP, which LSZ uses.
We do not have data of land inequality, but our robust OLS regression without initial land inequality provides very similar results regarding the signs and significance levels of civil liberty and financial development. The estimation of initial secondary education shows the same sign as LSZ, but the significance level is only 5% (see the OLS column in Table 9 below). The replication is not exactly the same as LSZ since the estimation for financial development in the IV replications is not significant, but it is sufficient to replicate the estimates on civil liberty and secondary education.
Table 9 below lists the replication results of LSZ [2] with our quasi-cross-sectional data. All of the significant results in Table 8 have now disappeared or reduced, and the effect of investment has become significantly positive but it was insignificant and negative in the LSZ regressions. The very small values in the VIF column show that multicollinearity is no longer a concern in the regressions. The results are still similar if the time variable is assumed to be trended. Therefore, LSZ’s estimation results are not robust to the specifications or quasi-cross-sectional data.
4.3 Milanovic (2005)
Milanovic (2005) [24] studies how openness and direct foreign investment affect income distribution within a country. He finds that poor people in low-income countries have less income share when the countries are more open to trade; however, as national incomes rise, the poor and middle class benefit more than the rich. We replicate Milanovic (2005) with similar variables, but we use total investment to replace FDI and the freedom dummy to replace democracy due to data availability in our data set.
Taking the usual approach in the literature (Milanovic, 2005; Barro, 2000; Ravallion, 2001; Dollar and Kraay, 2002) [1, 24–26], using the interaction of openness and income to denote their nonlinear relationship, we also consider the interactive term between income and investment share within output. The replication results are shown in Table 10 below.
The replication also shows some results similar to those of Molanovic’s Tables 3 and 4. Openness significantly decreases the income shares of the bottom first and third deciles in low-income countries (Opnsh*GDP). The effects of openness on the middle-income deciles are indeed positive, but the effects are not significant for the bottom second or fourth deciles. Additionally, the effects of the level of openness are negative and significant for the bottom 1st, 3rd, 5th and 7th deciles. Government spending also shows similar results to Milanovic’s results.
We run the same specifications with our quasi-cross-sectional data, and the results are listed in Table 11 below.
Table 11 shows that after multicollinearity has been reduced in the same specification the instrumental GMM estimation does not present significant effects of openness or government spending on the bottom deciles in low-income countries (OpshGDP). The negative R2 indicates poor power of the IV GMM estimator because endogeneity is no longer an issue in the data, and R2 becomes reasonably positive with LSDV and POLS. Note that the results are still similar if the time variable is assumed to be trended. Therefore, Milanovic’s estimation results are not robust to the quasi-cross-sectional data.
V Robust determinants of income distribution with the quasi-cross-sectional data
We use static and dynamic specifications with trend and fixed-effect time forms to explore robust determinants of income inequality across countries by POLS and within countries by LSDV on the quasi-cross-sectional data.
Many studies (e.g., Rodriguez-Pose and Tselios, 2009; Teulings, and Rens, 2008) [55, 56] use a robust system GMM estimator on a dynamic panel model to address heteroskedasticity and endogeneity, but it is difficult to justify the over-instrument issue that makes the estimation biased. We apply the simple LSDV regression on the dynamic panel model, where inequality at year t−5 is hypothesized to be correlated with inequality at year t and the 5-year average data of explanatory variables. Generally, the cross-sectional dependence issue in panel data makes the fixed-effects estimation inconsistent when annual data are used. However, we use 5-year average panel data for explanatory variables, and the time series of all countries barely share common years, which may dramatically reduce, if not entirely eliminate, the inconsistency issue and serial correlation in the error terms. Moral-Benito (2013) [4] simulates a set of estimators in a small time horizon and with a large observation size for various settings. The author finds that the fixed-effects (or the LSDV) estimator outperforms the first-difference GMM and system GMM estimators in terms of Nickell bias [57] and nonstationarity in the lagged and predetermined regressors. Therefore, we will apply the simple fixed-effects LSDV method to dynamic models.
To avoid multicollinearity caused by the inclusion of both openness and international trade (exports and imports), we will let either one of the two variables, not both, be included in our specifications. In particular, exports and imports, not openness, are used to estimate explained GDP, and openness but not trade is used to explain inequality. Both current and 5-year lagged educational attainment, employment and labor share may be included in explaining inequality if these variables show significant effects in the regressions or in making the specifications with trended and fixed time forms comparable.
We consider many explanatory variables used in the literature, consider both trended and fixed time forms for all variables, and consider both static and dynamic specifications, and we apply POLS to the estimation across countries and LSDV to the estimation within countries. Hence, there are eight regressions for each measurement of income distribution, in which the four POLS regressions are for between-country estimation that does not include country or time terms in the regressions, and the other four LSDV regressions are for within-country estimation that includes country and time terms in the regressions. We consider seven statistical measurements of income distribution: Gini, MES, MID, MIS MPS, B20, and T20.
We proceed with the estimation in five steps. First, we run the regression (3.3.1) for each variable xit in the set of original explanatory variables X = {xit} to obtain the new data; nxit, nX = {nxit} is the set of explanatory variables with the new data.
Second, we estimate the data of explained GDP, , which is fitted GDP, and unexplained GDP,
, which is the residual, by running OLS regression on the following specification (5.1):
(5.1)
and
are the fitted values and residuals of the above regression, respectively. nX includes nEmp, nG, nI, nK, nHR, nXT, nMT, nOpen, nPG, nFDI, ntey, nsey, npry, ncl, npi, and nlabsh and inequality measures (nGini, nMID, nT20, nMIS and nMPS) in the current and last periods. t(year) takes the form of either the dummy i. year or the level year, and the same is true for Eqs (5.2) ~ (5.4).
Third, we run OLS on the following regression (5.2) for to obtain its new data
, the residual of the regression, which has been netted out of the country and year effects.
Then, are orthogonal to
,
and all other regressors nxijt; the bilateral and multicollinearity among
,
,
and nxijt will be dramatically reduced and even close to zero because the variables with new data have zero mean. We run LSDV regressions on the following two specifications—static and dynamic specifications and trended and fixed time variables—to estimate within-country effects:
(5.3)
(5.4)
(5.5)
Note that the dependent variable Ineqit uses the original data (e.g., Gini rather than nGini) because country and time terms are included in the equations, and capital stock, investment, government spending and openness in nX take a ratio definition, which differs from (5.1). is the quadratic function of
and
, and nX includes all new variables nxijt and the interactive terms, which are defined as follows:
Finally, we run POLS on the following two specifications—static and dynamic specifications and trended and fixed time variables—to estimate between-country effects:
(5.6)
(5.7)
Note that the dependent variable nIneqit uses the quasi-cross-sectional data (e.g., nGini rather than Gini), and country and time terms are not included in the Eqs (5.6) and (5.7).
Simultaneity between inequality and the explained and unexplained GDPs is not an issue to concern in the regressions (5.6) and (5.7) because the main channels (country and year effects) of correlation among the variables have been removed in one hand, and the unexplained GDP is orthogonal to the explained GDP in the other hand; meanwhile, there could be some correlation between GDP and income distribution, hence it is reasonable to include measures of income inequality to explain GDP, or let them go into the unexplained GDP. Furthermore, if we drop all the measures of income distribution from the specifications of explaining GDP, then in the regressions the R-squared becomes much smaller than that including the income terms. The bilateral correlation coefficients between our income distribution measures and the explained (or unexplained) GDP are very small, which is want we expect to observe and might also demonstrate not much of the simultaneity.
5.1 Robust determinants of income distribution within countries
We run regressions (5.3)~(5.4) for seven dependent variables: Gini, MES, MID, MIS MPS, B20, and T20. Table 12 below summarizes the LSDV regression results for Gini with two specifications: one includes e2 and nGDP2, and the other does not. The regression results of the other 6 statistics are presented in S4~S6 Tables in S1 Appendix.
We have three blocks of variables in the table. The lower block includes all GDP terms, none of which, except nGDP2, has significant estimates in at least three of the four regressions of each specification. The middle block is for the variables that have significantly positive estimates in at least 3 of the 4 regressions from column (1) to column (4). The upper block is for the variables that have significantly negative estimates in at least three of the four regressions for each specification.
The estimates of nGDP2 are significantly negative from columns (5) to (8), but the estimates for GDP are insignificant from columns (5) to (7), and the negative turning point of GDP in column (8) is out of the data sample! Furthermore, the regressions excluding nGDP2 and e2 show large differences; in the middle block, ntey, npi, and nOpnsh have significant effects in three of the four regressions from columns (1) to (4), but none of them do from columns (5) to (8).
Therefore, we do not find evidence for the Kuznets hypothesis within countries in the panel data. We will consider only the specifications from columns (1) to (4), which do not include the quadratic terms nGDP2 or e2; the same rule applies to the regressions of the other 6 statistical measures.
We observe from columns (1) to (4) in Table 12 that in at least three of the four regressions, all explanatory variables in the middle block show significantly positive effects, and the variables in the upper block show significantly positive effects on the Gini coefficient, while the significant estimates of each variable are consistent in sign.
Note that using the same estimator, the regression results are robust for each dependent variable if any of the variables are dropped; these variables are not listed here for conciseness. It might be too strong an assumption to consider estimates robust only when they are significant in all four regressions (fixed or trended time, static or dynamic); hence, we propose the following definition.
Definition 5.1 A variable is called a robust determinant of income distribution if it shows significant and consistent estimates in at least three of the four regressions (static and dynamic specifications, fixed and trended time) by LSDV or POLS.
We choose the most robust of the four regressions for each dependent variable if most of the robustly significant and consistent estimates are shared by the other regressions. Therefore, we choose columns (1) and (4); both have 10 robustly significant estimates in the middle and lower blocks. Considering the significant estimates in the lower block, regression (4) outperforms (1).
We choose the most robust estimator for other dependent variables in the same way, and we may end up with 2 or 3 estimators having the same number of robust estimates for each dependent variable, which can be either static or dynamic specifications. The difficulty of choosing the most robust of the four regressions arises when one estimate is insignificant in one regression and significant in the other three regressions. The related variables are nEmp, npi, nGsh and nOpnsh*nGDP (for details, refer to S4~S6 Tables in S1 Appendix).
There are three possible choices for a unique robust estimator for each dependent variable: the first is static, the second is dynamic, and the last has exactly the same number of robust estimates in two of the four regressions. However, there could be one insignificant estimate among nEmp, npi, nGsh and nOpnsh*nGDP, which may be significant in the other three regressions.
Table 13 below summarizes the robust determinants, in static specification, of the seven statistics of income distribution, where the estimates of npi for Gini and nEmp for MID are not significant but they are significant in the other three regressions. The nonrobust estimates are not listed to make the table concise (details refer to column (1) in Table 12 for Gini and S4~S6 Tables in S1 Appendix for other statistics).
Summarizing Table 13, we have obtained the following evidence for the robust determinants of income distribution within countries:
- Unexplained GDP has robustly negative effects on the bottom 20% and median income shares, a positive effect on the top 20% income share, and a positive (slightly significant) effect on the Gini coefficient, which implies that relatively low income people could not deal well with economic uncertainty, but top-income people could.
- The overall effects of openness on inequality are robustly positive in underdeveloped countries but ambiguous in developed countries, while it has negative effects on the mean income share and mean population share and (weakly) positive effects on the middle 60% income share.
- Labor income share has robustly and consistently positive effects on income inequality; which is contradictory to the literature but presumably convincing because the estimates on different statistics of income distribution are consistent. A possible explanation is that the variation in labor income share is dominated by that of top skilled workers’ earnings, which are counted in the top 20% income share.
- Investment share to GDP shows robustly and consistently positive effect on income inequality, while capital stock share to GDP has a strongly negative effect on mean-income population share but not much effect on other statistics. Therefore, capital formation rather than capital stock (development) raises income inequality within countries.
- Educational attainment plays a complex role in income distribution. Primary education has robustly negative effects on income inequality. Secondary education has robustly negative effects on the mean population share and could have neutral impacts on the median income share. However, tertiary education has robustly positive effects on income inequality.
- The government spending share to GDP does not show consistent signs on MPS and B20 in the regressions. Gsh has a robustly positive effect on the Gini coefficient but a negative effect on both MPS and B20.
The signs on the interactive terms of openness with Gini, MIS and MPS are considered to be consistent because the coefficient size of MPS is larger than that of MIS so that the negative effect on Gini is dominated by the fall of MPS. Five-year lagged primary educational attainment shows robustly negative effects on the Gini coefficient, which is in line with the negative effect on mean population share and positive effect on the middle 60% income share, even though the effect on median income share is very significant and negative because the effect is economically the least (-0.013) among the other three (-0.014, 0.025, -0.018). Note that negative impacts on inequality here imply either negative effects on the Gini coefficient, T20, and MPS or positive effects on B20, MIS, MID, and MES. The opposite applies to the meaning of positive impacts on inequality.
5.2 Robust determinants of income distribution across countries
We run POLS regressions on the quasi-cross-sectional data by Eqs (5.5) and (5.6). Table 14 below summarizes the regression results of nGini with the specification without the term nGDP2.
Similar to Table 12, we group all explanatory variables into 3 blocks, where the upper and middle blocks are for the variables having robust estimates in at least 3 of the 4 regressions, and the lower block includes the estimates of GDP terms and the variables that are significant in exactly 2 of the 4 regressions. Columns (1) and (2) have the same number of robust determinants and column (2) has a higher explanatory power since the specification is dynamic. However, column (1) has two more significant estimates in the lower block than column (2). We choose the most robust estimator by looking at the number of significant estimates first and the adjusted R2 second; therefore, we choose column (1), the static and trended regression, as the most robust estimator for nGini. Similar to the LSDV regression results, the most robust estimator for a dependent variable may not be unique in terms of the number of robust estimates.
Table 15 below summarizes the static regression results of all 7 dependent variables by POLS. More detailed results of nGini refer to Table 14, and the detailed results of other statistics are presented in S7~S9 Tables in S1 Appendix.
Our regressions (see S7~S9 Tables in S1 Appendix) consistently show that labor income share is positively associated with income inequality across countries. Specifically, increasing labor income share is significantly associated with increasing top 20 income share and the Gini coefficient, meanwhile, the lower income shares (B20, MIS, MES, MID) are consistently (but insignificantly) decreasing; which is reasonable because labor income share accounts for the labor incomes of the top income earners, which dominate labor income share and the top 20 income share as well, while relatively speaking the lower labor incomes do not change much comparing to the changes of top labor incomes. This finding is contradictory to the related studies in the literature, all of which suffers the endogeneity issue of panel data and is often caused by the missing variables that are always highly correlated with some explanatory variables.
Comparing Tables 13 and 15, we can see that
- Some variables are robust determinants of income distribution for both within and across countries, among which investment and GDP are still the two factors impacting income distribution more robustly than other factors do, labor income share is still positively associated with income inequality, investment but not capital stock is significantly and positively associated with income inequality, etc.
- Some variables are robust determinants of a statistic within but not across countries, and vice versa.
- In particular, some variables (secondary education, financial development, freedom, government spending share to GDP, and inflation) do not show robustly significant effects on any income statistics across countries.
- Combine Tables 13 and 15 (see S10 Table in S1 Appendix) together and drop the estimates and/or variables that are not robust to the regressions in both of the two tables, then we are left with the robust determinants of income distribution across and within countries:
- The last period of employment, working hours, and explained GDP, interactive term of openness with GDP, and last period’s primary education are the five most robust determinants of decreasing income inequality, each of which shows robustly significant effects on three of the seven measures of income distribution; the other four factors (current employment, population growth, capital stock, and civil liberty with GDP) show the effects of decreasing income inequality on one or two of the seven measures of income distribution.
- Investment, current tertiary education and unexplained GDP are the three most robust determinants of increasing income inequality, in which investment shows the effects on five of the seven measures and its effect economically outperforms all other factors, unexplained GDP and current tertiary education show the effects on three of the seven measures; the other three factors (last period’s tertiary education, last period’s labor income share and civil liberty) show the effects on one or two of the seven measures of income distribution.
- There are some findings that differ from the related literature. For instance, investment is economically and statistically the most robust determinant (The economic size of the estimate of investment is shown to be dominating in the simulation of subsection 5.3); last period’s labor income share shows robustly significant and positive effects on the Gini coefficient and top 20 percent income share; primary and tertiary education, rather than secondary education, show robustly significant effects on income distribution.
5.3 Is it capitalism that has led to rising income inequality?
Piketty (2014, 2015) [58, 59] states that a higher rate of return on capital than growth rate could lead to a permanent increase in income inequality, which has been harshly criticized by mainstream economists (Acemoglu, 2015; Mankiw, 2015; Ray, 2015) [60–62]. We discuss this topic by simulating the total marginal effects of GDP, capital stock and investment on income distribution.
We do not take GDP growth and the rate of return on capital as explanatory variables in our regressions, but we have the GDP (linear and quadratic) terms, capital stock (share to output), investment (share to output), and the interactions of GDP with civil liberty, openness and inflation to account for the total effects of capitalism. GDP and its interactive terms explain growth effects, and capital stock and investment explain the effects of the rate of return on capital. We also consider unexplained GDP as one of the factors of capitalism since it is part of the total output.
Table 16 below summarizes the regression results of the GDP terms, capital stock and investment by LSDV and POLS using static specification for the Gini coefficient, the bottom and top 20% income shares and the mean income share, which are from the regressions in subsections 5.1 and 5.2. The regression results of the other three statistics (MID, T20 and MES) for the simulation are relegated to S11 Table in S1 Appendix.
The regression results in Table 16 show that investment share to GDP is the most robust and strongest factor of all explanatory variables, and capital stock alone does not show significant effects on inequality; in particular, the effects of openness on inequality are negative in developed countries and positive in developing countries. It is reasonable to assume that the growth of GDP (ΔGDP) can be much smaller than the growth of investment share (ΔIsh) to GDP. Therefore, investment strongly dominates all six other explanatory variables regarding their marginal effects on income distribution.
We perform experimental simulations using the results in Table 16 and take the total derivative of the dependent variable with respect to the GDP terms, capital and investment. Eq (5.3.1) below shows the total marginal effects of nGini by Table 16:
(5.3.1)
While calculating the total marginal effects for each dependent variable, we consider multiple growth scenarios which are combinations of value assignment for the changes in GDP, capital and investment terms and the levels of the three explanatory variables (nCL, nOpnsh, npi). Table 17 below shows the statistics of these variables.
Putting the assigned values of the related variables into Eq (5.3.1) yields the two values of ΔGini and ΔnGini. We perform the same simulation for the other dependent variables using LSDV and POLS estimators. We run the simulation experiments one million times and count the times that the total marginal effects are positive or negative to obtain the probability of the event. For each statistic, the simulation results are very similar for different growth scenarios, hence, we report only the results of two growth scenarios, one is that the changes in capital variables (e, GDP, K, I) take 0.1 times standard deviation of their data and the levels of civil liberty, openness and inflation take a value in an interval centered at zero and has a length of one standard deviation of their data; the other is for 2 and 3 standard deviations of the two sets of variables, respectively. The second scenario covers almost all of the observations of the dataset. We also consider the negative growth cases of capital terms, which generate exactly the opposite results to the positive growth cases, and the results are not reported here to be concise. Table 18 below summarizes the experimental results.
Table 18 shows that the total marginal effects of GDP, capital stock and investment on income inequality are very likely to be positive; that is, the effects on all statistics are in line with increasing income inequality within and between countries except the effects on the middle 60% and top quintile income shares between countries; especially, the bottom quintile and mean income shares were negatively impacted and mean population share was positively affected from growth within and across countries. Note that for both between and within countries the total marginal effects of growth on both MIS and MPS are very likely to be negative, which is in line with the regression results in Shao and Krause [6].
The story behind this conclusion might be that investment is the main drive of growth, and its profits had been mostly collected by capital holders and executive officers because within countries the top 20% income share is very likely to increase in growth, while the bottom 20%, mean, median, and middle income shares had rarely increased; while between countries the estimates are different on the middle 60% and top 20% income shares, which might explain that the skilled labor premium due to migration between countries, and capital incomes face higher uncertainty than that within countries.
Therefore, the development of capitalism rarely favored most of the population in relative terms within a country, even though there are exceptions across countries, where the middle 60% income share was very likely to be increased.
VI Concluding remarks
Empirical studies on income distribution with panel data have not discussed the issues caused by multicollinearity, imprecise estimates and endogeneity when relevant variables are omitted and correlated with covariates concerned. We deal with these issues by transferring panel data into quasi-cross-sectional data, which either completely removes or dramatically reduces multicollinearity among all explanatory variables and allows our estimates to be unbiased and consistent. We consider both fixed and trended time forms for the data and both static and dynamic specifications; therefore, we have four specifications to explore the robust determinants of income distribution either within countries by LSDV or across countries by POLS.
Our findings update the related literature on the main determinants of income distribution, which include development level, openness, educational attainment and labor income share, among which investment ratio to GDP and employment are found to be the most robust determinants within and across countries. Our estimates on labor income share show robustly and consistently positive effects on income inequality, which challenges the related literature. A possible explanation is that top wage earnings dominate the variation of labor income share, while increasing top wage earnings imply increasing the top income share and labor income share as well so that both labor income share and income inequality increase. Using the estimates, our simulation shows that in relative terms, the total marginal effects of capitalism development did not favor most of the people within a country.
This study tries to reduce the data correlation among variables by changing the data structure, but the structural correlation among productive factors has not been addressed, which deserves future exploration.
Supporting information
S1 File. This is the word file to explain the files of supporting information.
https://doi.org/10.1371/journal.pone.0253291.s002
(DOCX)
S2 File. This is the excel file for raw annual data.
https://doi.org/10.1371/journal.pone.0253291.s003
(XLSX)
S3 File. This is the dta file for the 5-year averaged data of all variables.
https://doi.org/10.1371/journal.pone.0253291.s004
(DTA)
S4 File. This is the do file for the regressions with the data treated by trended time effects.
https://doi.org/10.1371/journal.pone.0253291.s005
(DO)
S5 File. This is the do file for the regressions with the data treated by fixed time effects.
https://doi.org/10.1371/journal.pone.0253291.s006
(DO)
Acknowledgments
I thank Professor Peter Edger and Fabio Canova for their helpful comments; I also thank the participants at the China Camphor 2020 Nan Chang meeting and Econometric Association 2020 Asia conference for their comments. Dr. Tsun-Feng Chiang helped to calculate the data of MPS and MIS.
References
- 1. Barro Robert. J. Inequality and Growth in a Panel of Countries. Journal of Economic Growth. 2000, 5, 5–32. https://link.springer.com/article/10.1023/A:1009850119329
- 2. Li Hongyi, Squire Lyn and Zou Hengfu. Explaining International and Intertemporal Variations in Income Inequality. Economic Journal. 1998, 108, 26–43.
- 3. Makram EI-Shagi and Liang Shao. The Impact of Income Inequality and Redistribution on Growth. Review of Income and Wealth. 2019, June. Vol. 65(2), pp. 239–263.
- 4. Moral-Benito Enrique. Likelihood-Based Estimation of Dynamic Panels with Predetermined Regressors. Journal of Business and Economic Statistics, 2013, 31, 451–72. https://doi.org/10.1080/07350015.2013.818003
- 5.
Wooldridge , Jeffrey M. Introductory Econometrics: a Modern Approach. 7th edition. Cengage Learning. 2018.
- 6. Shao , Liang F. and Krause Melanie. Rising Mean Incomes for Whom? PLOS ONE. 2020, 15(12): e0242803. pmid:33326451
- 7. Kuznets Simon. Economic Growth and Income Inequality. American Economic Review. 1955. 45, 1–28.
- 8. Aghion Phippe and Bolton Patric. Distribution and Growth in Models of Imperfect Capital Markets. European Economic Review.1992. 36, 603–11.
- 9. Galor , Oded and Tsiddon Daniel. Income Distribution and Growth: The Kuznets Hypothesis Revisited. Economica. 1996, 63, s103–s117.
- 10. Acemoglu Daron and Robinson James A. The Political Economy of the Kuznets Curve. Review of Development Economics. 2002, 6(2), 183–203.
- 11. Anand Sudhir and Kanbur Ravi. The Kuznets Process and the Inequality—Development Relationship. Journal of Development Economics. 1993, 40, 25–52. https://doi.org/10.1016/0304-3878(93)90103-T
- 12.
Fields, Gary S. and George H. Jakubson. New Evidence on the Kuznets Curve. Unpublished working paper, Department of Economics, Cornell University. 1993.
- 13. Eusufzai Zaki. The Kuznets Hypothesis: An Indirect Test. Economics Letters. 1997, 54, 81–85.
- 14.
Barro, Robert J. Inequality and Growth Revisited. ADB Working Paper Series on Regional Economic Integration, No. 11. 2008. http://hdl.handle.net/11540/1762.
- 15. Higgins Matthew and Willianson Jeffrey. Explaining Inequality the World Round: Cohort Size, Kuznets Curves, and Openness. Southeast Asian Studies. 2002, Vol. 40(3), pp. 268–302.
- 16. Thorton John. The Kuznets Inverted-U Hypothesis: Panel Data Evidence from 96 Countries. Applied Economics Letters. 2001, 8, 15–16.
- 17. Barro Robert and Lee Jong-Wha. A New Data Set of Educational Attainment in the World, 1950–2010. Journal of Development Economics. 2013, vol. 104, pp.184–198.
- 18. Huang , Ho-Chuan R. A Flexible Inference to the Kuznets Curve. Economics Letters. 2004, 84, 289–296.
- 19. Anand Sudhir and Kanbur Ravi. Inequality and Development: a Critique. Journal of development economics. 1993, 41, 19–43.
- 20. Savvides Andreas and Stengos Thanasis. Income Inequality and Economic Development: Evidence from the threshold Regression Model. Economics Letters. 2000, 69, 207–212.
- 21. Frazer Garth. Inequality and Development Across and Within Countries. World Development. 2006, Vol. 34, No. 9, pp. 1459–1481
- 22. Kalliovirta Leena and Malinen Tuomas. Non-linearity and cross country dependence of income inequality. The review of income and wealth. 2020, 66(1), 227–249. https://doi.org/10.1111/roiw.12377
- 23. Wood A. Openness and Wage Inequality in Developing Countries: the Latin American Challenge to East Asian Conventional Wisdom. The World Bank Economic Review. 1997, 11(1), pp. 33–57.
- 24. Milanovic Banko. Can We Discern the Effects of Globalization on Income Distribution? Evidence from Household Surveys. World Bank Economic Review. 2005, Vol. 19, No. 1, pp.21–44.
- 25. Dollar David. & Kraay Aart. (2002). Growth is Good for the Poor. Journal of Economic Growth, 7, 195–225. https://doi.org/10.1023/A:1020139631000
- 26. Ravallion Martin. Growth, inequality and poverty: looking beyond averages. World Development. 2002, 29(11), pp. 1803–1815.
- 27. Lundberg Mattias and Squire Lyn. The Simultaneous Evolution of Growth and Inequality. The economic journal. 2003, 113(487), 326–344.
- 28.
Harrison, Ann. Has Globalization Eroded Labor’s Share? Some Cross-Country Evidence. MPRA Paper 39649, University Library of Munich, Germany. 2005.
- 29.
Ortega, Daniel and Francisco Rodriguez. Are Capital Shares Higher in Poor Countries? Evidence from Industrial Surveys. Wesleyan Economics Working Papers 2006–023, Wesleyan University, Department of Economics. 2006
- 30. Deininger Klaus and Squire Lyn. A New Data Set: Measuring Income Inequality. The World Bank economic review. 1996, 10(3), 565–91.
- 31. Jaumotte Florence, Lall Subir and Papageorgiou Chris. Rising Income Inequality: Technology, or Trade and Financial Globalization? IMF economic review. 2013, 61, No. 2, 271–309.
- 32. Young Andrew T. and Tackett Maria Y. Globalization and the Decline in Labor Shares: Exploring the Relationship beyond Trade and Financial Flows. European Journal of Political Economy. 2018, Vol. 52, pp. 18–35. http://dx.doi.org/10.1016/j.ejpoleco.2017.04.003.
- 33. Wood Adrian. How Trade Hurt Unskilled Workers. Journal of Economic Perspectives. 1995, 9 (3): 57–80.DOI: 10.1257/jep.9.3.57
- 34.
Thomas, Vinod & Wang, Yan & Fan, Xibo. Measuring Education Inequality: Gini Coefficients of Education. World Bank. Policy research working paper series 2525, 2001.
- 35. Checchi Daniele, and Cecilia . Labor Market Institutions and the Personal Distribution of Income in the OECD. Economica, 2010, 77, no. 307: 413–50.
- 36.
Castello-Clement, Amparo and Rafael Domenech. Human Capital and Income Inequality: New Facts and Some Explanations. BBVA working paper. 2017.
- 37. Eicher Theo S. and Cecilia Garcia-Penalosa. Inequality and Growth: the Dual Role of Human Capital in Development. Journal of Development Economics. 2001, Vol. 66, 173–197.
- 38. Erosa Andres; Tatyana Koreshkova and Diego Restuccia. How Important is Human Capital? a Quantitative Theory Assessment of World Income Inequality. The review of economic studies, 2010, 77 (4), 1421–1449.
- 39. Santos Marcelo, Sequeira Tiago N., and Alexandra Ferreira-Lopes. Income Inequality, TFP, and Human Capital. Economic Record, 2017, 93, 89–111. http://onlinelibrary.wiley.com/doi/10.1111/1475-4932.12316/abstract
- 40. Daudey Emilie, and Cecilia García-Peñalosa. The Personal and the Factor Distributions of Income in a Cross-Section of Countries. Journal of Development Studies, 2007, 43, no. 5: 812–29.
- 41.
Jaumotte, Florence, & Tytell, Irina. How Has the globalization of Labor Affected the Labor Income Share in Advanced Countries?” IMF Working Paper WP/07/298. 2007.
- 42.
Lawrence, L.; and Slaughter, M. International Trade and American Wages in the 1980s: Giant Sucking Sound or Small Hiccup? Brookings Papers on Economic Activity, Microeconomics, 1993, 2:161 226.
- 43. Bengtsson Erik, & Waldenström Daniel. Capital Shares and Income Inequality: Evidence from the Long Run. The Journal of Economic History, 2018, 78(3), 712–743.
- 44. Karabarbounis L. and Neiman B. The Global Decline of the Labor Share. The Quarterly Journal of Economics, 2013, 129(1), 61–103.
- 45. Schultz T. Paul. Inequality in the distribution of personal income in the world: How it is changing and why. J Popul Econ., 1998, 11, 307–344. https://doi.org/10.1007/s001480050072
- 46. Chong Albert. Inequality, Democracy, and Persistence: Is There a Political Kuznets Curve? Economics & politics, 2004, 16, 2, 189–212.
- 47. Chong Albert & Gradstein Mark. Inequality and Institutions. The Review of Economics and Statistics, MIT Press, 2007, vol. 89(3), pages 454–465.
- 48. Rodrik Dani. Democracies pay higher wages. The quarterly journal of economics, 1999, 114, 3, 707–738
- 49. Carter John R. An Empirical Note on Economic Freedom and Income Inequality. Public Choice, 2007, 130, 163–177. https://doi.org/10.1007/s11127-006-9078-0
- 50. Deaton Angus. S. and Christina Paxson. Intertemporal Choice and Inequality. Journal of Political Economy, 1994, 102 (3): 437–467.
- 51. Bulir Ales. Does inflation matter? IMF staff papers, 2001, Vol. 48, #1, 139–159
- 52. Jantti Markus and Jenkins Stephen P. The impact of macroeconomics on income inequality. Journal of economic inequality, 2010, 8, 221–240.
- 53. Barro Robert J., Jong Wha Lee. A new data set of educational attainment in the world, 1950–2010. Journal of Development Economics, 2013, Volume 104, Pages 184–198.
- 54. Shao Liang F. How Do Mean Division Shares Affect Development and Growth? Panoeconomicus, 2017, vol. 64, issue 5, 525–545. https://doi.org/10.2298/PAN150520033S
- 55. Andrés Rodríguez‐Pose, Vassilis Tselios. Education and Income Inequality in the Regions of the Europe Union. Journal of Regional Science, 2009, 49: 411–437. https://doi.org/10.1111/j.1467-9787.2008.00602.x
- 56. Teulings Coen & Thijs van Rens. Education, Growth, and Income Inequality. The Review of Economics and Statistics, MIT Press, vol. 90(1), pages 89–104, 2008, February.
- 57. Nickell Stephen. Biases in Dynamic Models with Fixed Effects. Econometrica, 1981, 49(6), 1417–1426.
- 58.
Piketty Thomas. The Capital in 21st Century. Harvard University Press.2014.
- 59. Pikety Thomas. Putting Distribution Back at the Center of Economics: Reflections on Capital in the 21st Century. Journal of economic perspectives, 2015,, Vol. 29, #1, pp. 67–88.
- 60. Acemoglu Daron and Robinson James A. The Rise and Decline of General Laws of Capitalism. Journal of economic perspectives, 2015, 29, 1, 3–28.
- 61.
Mankiw, N. Gregory. Yes, r>g, So What? American Economic Review (Papers and Proceedings), 2015, Vol. 105, No. 5, pp.43-47.
- 62. Ray Debraj. Nit-Piketty: A Comment on Thomas Piketty’s Capital in the Twenty First Century. CESifo Forum, 2015, 16 (1), 19–25.