The Emergence of Three Human Development Clubs

We examine the joint distribution of levels of income per capita, life expectancy, and years of schooling across countries in 1960 and in 2000. In 1960 countries were clustered in two groups; a rich, highly educated, high longevity “developed” group and a poor, less educated, high mortality, “underdeveloped” group. By 2000 however we see the emergence of three groups; one underdeveloped group remaining near 1960 levels, a developed group with higher levels of education, income, and health than in 1960, and an intermediate group lying between these two. This finding is consistent with both the ideas of a new “middle income trap” that countries face even if they escape the “low income trap”, as well as the notion that countries which escaped the poverty trap form a temporary “transition regime” along their path to the “developed” group.


Introduction
There is strong evidence that countries are clustered into at least two groups in terms of income per capita. Quah ([1], [2], [3]) finds evidence of twin peaks in the distribution of income with a cluster of rich countries and a cluster of poor countries. One possible explanation of this clustering into income groups is that countries differ in their underlying characteristics. Bloom, Canning and Sevila [4] reject this hypothesis in favor of a model where countries that are fundamentally the same may either be rich, or may be caught in a self-reinforcing poverty trap from which it is difficult to escape. There is a range of theoretical models consistent with two distinct equilibria and associated poverty traps (e.g. Galor and Zeira [5], Banerjee and Newman [6], Kremer [7]). Whatever the explanation, the fact that there are two ''clubs'' changes the way we think about economic development. Rather than a continuous process economic growth may require disproportional effort or a ''big push'' to escape from a poverty trap (Murphy, Shleifer and Vishny [8]).
Similarly, Mayer-Folkes [9] and Bloom and Canning [10] argue that there are two clubs in terms of life expectancy, with one group of countries being clustered around a low level of life expectancy and another being clustered around a high level. Again this is evidence against a smooth progression from low to high life expectancy. We are not aware of similar evidence for education.
In this paper we focus on the joint distribution of income per capita, life expectancy and schooling. We focus on these three variables, adding schooling to the established focus on the distribution of income and health, because they have been identified as fundamental determinants of human welfare (e.g. Sen [11]), as reflected, for example, in UNDP''s Human Development Index. In addition to being important for welfare, these three variables are causally interlinked. High income provides resources that can be invested in education and health while health and education are forms of human capital that may lead to high income (e.g. Barro [12], Pritchett and Summers [13]).
We look at the number of clusters in the data graphically using a non-parametric kernel density estimator and also test formally for the number of clusters. We assume that life expectancy, income, and schooling of countries in a cluster have a joint trivariate normal distribution around a common cluster mean. We use a likelihood ratio test for the components in a finite multivariate normal mixture using the parametric bootstrap, which allows for the fact that the distribution of the test statistic in this case is quite non-standard.
We find that in 1960 there are only two clubs in terms of income, health and education. One club has high income, high life expectancy and high education while the other has lower levels of all three variables. By 2000 the picture has changed and we find evidence of three components. We have the same two clubs as before; a high income, high life expectancy and high education club while the other has lower levels of all three variables, though the high income group has advanced in terms of the levels of all three indicators relative to 1960 while the low income, low health and low education has scarcely improved. However we also see the emergence of a third, middle group with income and education levels clustered around a point between those of the two extreme groups but with life expectancy that is only slightly below that of the ''developed'' club.
Our approach allows us to assign countries to high, middle and low levels of development based on Bayesian posterior probabilities that they are in a group given their observed data on income, health, and education, and therefore we do not have to rely on arbitrary cutoffs to determine group membership. The countries with high probability of membership in the high income, high life expectancy and high education group in 2000 are largely the same as those in this group in 1960. However, the group of countries that had low levels of all three variables in 1960 has split in two, allowing some countries to move up from the low to the middle group.

Data
For income we use GDP per capita, at purchasing power parity, based on 2005 constant prices, calculated using a chain index method. This is the ''rgdpch'' series from the Penn World Tables Version 7.0 (Heston, Summers and Aten [14]) in log terms. Education is measured using the years of schooling of the population aged 15-64, who are not in school. This is the variable ''ty1564'' from Cohen and Soto [15]. For health we use life expectancy at birth from the United Nations World Population Prospects: The 2008 Revision (United Nations [16]).
The data on income per capita is annual, while the data on life expectancy is for 5-year intervals. The data on education is available for 1960, 1970, 1980, 1990 and 2000. We average the income and health data to match the education data. For example, we use the average of the GDP per capita observations from 1955 to 1965 as income measure for 1960. We use the average of life expectancy from 1955-1960 and 1960-1965 as health measure for 1960. Our data set includes 84 countries covering about 90% of the world''s population.

Methods
Gaussian mixture models are often used for cluster analysis, see e.g. Fraley and Raftery [17]. One approach is to choose the number of clusters that best fits the data. Several criteria for goodness of fit have been proposed, including the Bayesian Information Criterion (BIC) and the Integrated Completed Likelihood (ICL) (e.g. Biernacki et al. [18], Fraley and Raftery [17]) and a globally optimal BIC with a potentially restricted covariance matrix (Fraley and Raftery [19]). While we report results for the BIC and ICL selection criteria, our preferred approach is to use a classical testing framework where we test the null hypothesis of K 0 clusters against the alternative of K 0 z1 clusters, for each K 0 , and only reject K 0 clusters if we reject the null against the alternative at the 5% significance level. This is a conservative approach, which implies that we only accept a larger number of clusters if the data definitely rejects a smaller estimate.
We test for the number of components in the normal mixture models by using the parametric bootstrap. Given data x with independent observations x 1 , . . . ,x n , the log-likelihood for a dvariate Gaussian mixture model with K components is with Q( : ; m k ,S k ) being the density of a d-variate normal distribution with mean m k~( m k1 , . . . ,m kd )'' and covariance S k~( s kij ) 1ƒi,jƒd and y K~( a 1 , . . . ,a K{1 ; m k1 , . . . ,m kd ; s kij )) with 1ƒiƒjƒd and 1ƒkƒK.
We use the resampling approach introduced by McLachlan [20] for the assessment of the true null distribution of the likelihood ratio test in testing H 0 : K~K 0 against H 1 : K~K 0 z1: 1,000 Bootstrap samples are generated from the mixture model fitted under the null hypothesis of K 0 components. That is, the Bootstrap samples are generated from the mixture model with y K0 replaced byŷ y K0 , computed by the consideration of the log likelihood formed from the original data under H 0 . The value of the likelihood ratio teststatistic (LRT) is computed for each Bootstrap sample after fitting mixture models for K~K 0 and K 0 z1 in turn to it. The replicated values of LRT formed from the Bootstrap samples provide an assessment of the Bootstrap and therefore the true null distribution of the LRT. So, the test rejects H 0 if LRT for the original data is greater than q(1{a)Br values of LRT for the Bootstrap samples, where a is a prespecified significance level (e.g. a~0:05).
When determining the number of components, we successively apply this testing procedure for increasing values of K 0 until the hypothesis can no longer be rejected. In order to double-check the conclusions, we also determine the number of components chosen by the model selection criteria BIC (Fraley and Raftery [17]) as well as the ICL (Biernacki et al. [18]).
Once we have fitted a finite mixture model with an appropriate number of components to the data, each observation can be assigned posterior probabilities to belong to each of the components in the mixture model given the data. For a three component normal mixture the posterior probability p(k; x i ) of an observation x i to belong to the kth component is equal to with k~1, . . . ,3. We cluster the data by assigning each observation x i to the component j of the mixture to which it has the highest posterior probability of belonging, that is j~argmax k p(k; x i ). Figure 1 shows a kernel density estimate for the distribution of income per capita in 1960 and 2000. In 1960 we see a unimodal distribution with a single peak. However the distribution does have a ''shoulder'' to the left, with a mass of low-income countries. If countries are clustered into two groups, and the means of the clusters are far apart, the result will be a twin-peaked distribution. However, if the means of the two clusters are close together the result will be a shoulder in the data as seen in Figure 1 for 1960 income per capita. In general twin-peaked distributions represent at least two clusters (see Vollmer, Holzmann and Schwaiger [21] for a discussion of the relationship between the number of clusters and the number of peaks in the data). By 2000 however we see three peaks in the income per capita distribution.

Results
The graph in Figure 1 for education shows a single peak with a high education shoulder for 1960. For 2000 there is a peak with two shoulders, one above and one below the peak. For life expectancy we see twin peaks in 1960, a tall peak above 40 years and a shorter peak around 70 years. In 2000, we see a single peak around 75 years with a broad shoulder to the left.
We test for the number of normal components in the trivariate mixture distribution. Table 1 shows the bootstrapsed p values (based on 1000 bootstrap repetitions) for the likelihood ratio statistic for one versus two, two versus three, and three versus four components in the distribution for each decade of data. For all decades we reject one versus two components. For 1960 we do not reject two versus three components. It appears that for 1960 the data can be described as a mixture of two trivariate normal distributions. For 1990 and 2000 however, we reject two components against three at the 5 percent level. However, we do not reject three against four components. It appears that we need three components to describe the 1990 and 2000 data. The values of the model selection criteria BIC and ICL, displayed in Tables 2 and 3, confirm these findings. Using restricted covariance models as proposed in Fraley and Raftery [17] supports, after application of appropriate merging algorithms as in Baudry et al. [22], this conclusion so that the result of 3 components in the 2000 data is robust to the use of different methods of identifying these components. Tables 4 and 5 show the average characteristics of countries in each cluster assigning each country to the cluster it is most likely a member of (based on the posterior probability). In 2000, the ''developed'' group has advanced in terms of the levels of all three indicators relative to 1960 while the low income, low health and low education group has scarcely improved; while average education and health levels are slightly higher, income levels are actually substantially lower. We also see the emergence of a third, middle group with income and education levels clustered around a point between those of the two extreme groups but with average life expectancy that is only less than 10% below that of the high level club.       Those who moved up already started on a higher level of all three indicators in 1960. When examining the development of indicators of this group, one notes a remarkably steady rate of progress in health and education indicators in this group with life expectancy advancing by about 5 years per decade, and education by 1 year per decade. In contrast, GDP growth varies much more (with the 80 s being a particularly low growth period, and the 60 s and 90 s being high growth periods). Those that remained in the poor group developed quite differently over time. After some  modest progress in all indicators in the 1960 s and 70 s, income stagnated, and health improvements also slowed down dramatically since then; only education years continued to rise largely unabated. This suggests that these two groups of countries were really on different trajectories leading them to separate into two components. It also suggests that the linkages between the three indicators are not as close as one might surmise. In particular, education improvements seem possible without much income growth and the relation between health and income improvements is also not as close with income fluctuating much more. Figures 2 and 3 contain the contour plots of kernel density estimates for the joint distributions of health with income, education with income, and education with health. The country observations are colored based on their component assignment in the joint trivariate distribution of education, health and income. In Figure 2, countries that leave the ''underdeveloped'' group by 2000 are symbolized by upward triangles. On average, these countries have higher levels of all three variables in 1960 than countries that stay in the ''underdeveloped'' group.

Discussion
We document the emergence of a third development club. In 1960 countries were clustered in two groups; a rich, highly educated, high longevity ''developed'' club and a poor, less educated, high mortality ''underdeveloped'' club. By 2000 we see the emergence of three clubs; one underdeveloped group remaining near 1960 levels, a developed group with higher levels of education, income, and health than in 1960, and an intermediate group lying between these two.
This sheds some light on the issue of convergence in development. There is a group of poor countries that are stagnating, and a group of rich countries that are forging ahead, leading to increasing worldwide income inequality. However about half the countries that were poor in 1960 have been very successful, and have seen substantial improvements in income, health and schooling. These countries were already better off in 1960 but were able to steadily enhance income, education and health levels that allowed them to escape from the low development group.
Our results raise the issue of what lies behind the move from a simple ''poverty trap'' setting in 1960 of two clusters to the three clusters we see in 2000. They emphasize the disparate experience of the underdeveloped countries with one sub-group having done remarkably well while another has largely failed. The emergence of a middle group is consistent with two fundamentally different interpretations. One interpretation could be the idea of a new ''middle income trap'' that countries face even if they escape the ''low income trap'' (Griffith [23]); evidence in favor of this view would be the fact that it appears hard to break into the top development group which was achieved by only one country in the sample. Inspection of Figure 3 and Table 5 suggests that the income gap remains massive (with no overlap between the groups) and is not easy to close, particularly in a situation where incomes in the high component also continue rising.
Another interpretation could be the idea that a large number of countries, which escaped the poverty trap, form a temporary ''transition regime'' along their path to the ''developed club'' (Galor [24]). If such an interpretation is correct, this implies that the transition does not happen very quickly as only one country moved to the developed club and the gaps (particularly in incomes) remain large. But of course high growth and further rapid improvements in education and health may over time enable the countries of the middle group to transition to the developed group.