Ecological Footprint Model Using the Support Vector Machine Technique

The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance.


Introduction
The ecological footprint (EF) approach was developed by Wackernagel and Rees [1]. It is calculated as the total area of bioproductive land and water required to continuously produce all resources consumed, and to assimilate all wastes generated by a defined population in a specific location [2]. The EF approach provides a comprehensive unit of measurement that allows for comparisons of various types of consumption-based impacts [3]. Therefore, since its development the EF approach has become the most widely-used measure of environmental sustainability [4].The EF approach aggregates typically complex resource use patterns into a single number [5]. The validity of the per capita EF, which traces the average amount of resources a person in a given country consumes, and the amount of waste they generate is confirmed by its significant correlation with important environmental impacts, for example, national emissions of ozone depleting substances, and nuclear power generation [6].
There are six resources considered by the EF: crop and pasture lands for production of goods and food, built land for construction, forest for the production of wood products, fossil energy for carbon dioxide emissions from fuels, and fish for food production. All of these are measured in global hectares (ha). A global hectare represents a hectare of land with global average bio-productivity. Social scientists and policymakers can compare the per capita EF of various nations to the per capita ecological capacity that exists on earth. For example, in 1996 the per capita EF ranged from 0.35 hectares to more than 16 hectares, and the majority of the estimated per capita EFs were higher than the Earth's bio-capacity per capita [7]. According to McDonald and Patterson [8], the global EF is at least 30% larger than the Earth's bio-capacity, illustrating the severity of resource overuse. EF figures can also be used as benchmarks for assessing sustainability at a national level, for example, nations with an EF at or below 1.8 hectares per capita have a global impact that could be replicated by other nations without threatening long-term sustainability [2]. Although the EF model has been used at various levels, including global [9], municipal [10], national [9], city [11] and individual [12], no previous studies have attempted to apply a support vector machine (SVM) to predict national EF. In this paper, we seek to fill this research gap by calculating the EF of 23 nations through the use of SVM techniques. The countries analyzed in this study are listed in the Appendix S1. More specially, the purpose of this research is twofold: First, to determine the major factors influencing national EFs, and second, to build a SVM model based on these identified factors to calculate EF.

Materials
Drawing on previous research, we found a wealth of evidence suggesting that a variety of factors influence EF. Cross-sectional analyses consistently show that national per capita ecological footprints are largely a function of gross domestic product (GDP) [13,14,15]. A negative relationship between per capita EF and export dependence (measured as the proportion of total GDP generated by exports) has also been identified [15]. According to Jorgenson and Burns [16], nations with a greater intensity in the services sector experience higher increases in per capita EF. Some evidence suggests that domestic income inequality is negatively related to the relative size of a nation's per capita EF [16]. Jorgenson (2003) found that urbanization has a positive impact on EF [13]. From the above, it can be seen that the factors that influence EF can be characterized as affluence (as measured by GDP), export dependence, service intensity, domestic income inequality, and urbanization.

Methodology
The SVM is a machine-learning method based on the structural risk minimization principle from statistical learning theory. It maps input data x into a higher-dimensional feature space Q by nonlinear mapping to yield and solve a linear regression problem in this feature space [17]. The regression approximation addresses the problem of estimating a function based on a given set where x i denotes the input vector, y i denotes the output value, and n denotes the total number of data patterns. In SVM, the regression function is given as the following: Where d is a scalar threshold, v is the weight vector, and Q x ð Þ is the high-dimensional feature space that is nonlinearly mapped from the input space x.
Support vector regression (SVR) performs linear regression in the high-dimensional feature space by e-insensitive loss. At the same time, to prevent over-fitting and thereby improving the generalization capability, the following regularized functional involving summation of the empirical risk and a complexity term v k k 2 .
2, is minimized. The coefficients v and d can be estimated by minimizing the regularized risk function: Min The regression problem is transformed into the following constrained formation: Where the constant r stands for the penalty degree of the sample with error exceeding l. Two positive slack variables d and d Ã represent the distance from actual values to the corresponding boundary values of l{tube.
A dual problem can then be derived by using the optimization method to maximize the function: Where u i and u Ã i are the Lagrange multiplier.
The SVM for function fitting obtained by using the above mentioned maximization function is then given by the following function: In Equation 5, sample points that appear with non-zero coefficients are the so-called support vectors. The kernel function K x i ,x j À Á~Q x i ð ÞQ x j À Á satisfies Mercer's conditions and performs the non-linear mapping.

Preliminary data analysis
In this study, per capita EF was taken from White [18,19], the latest data on national level per capita EF, and GDP data were taken from the World Bank [20]. To correct for excessive skewness, we use the natural logarithm transformation of GDP data. Export data as a percentage of total GDP were taken from the World Bank [20] and used as a measure of export intensity and export dependence. Service data as a percentage of total GDP were taken from the World Bank [20] and used as an indicator of service intensity. Domestic income inequality data were taken from the World Bank [20] and are presented as GINI coefficients, which measure the distribution of income within countries. A GINI index score of zero suggests perfect equality, while an index score of 100 suggests perfect inequality. Urbanization data were taken from the World Bank [20], and are measured as the percentage of the total population living in cities, which represents a country's relative level of urbanization. Following Jorgenson and Burns [16], we regress these data on per capita GDP and use the residuals as measures of urbanization to minimize collinearity. Table 1 provides descriptive statistics for all variables used in the analysis. The product moment correlations between variables are shown in the Table 2. Although correlations do not prove causation, they can be used to generate hypotheses; therefore Table 2 is presented to highlight the correlations among the five variables used for analysis. It indicates that most of the correlations were significant and in the expected direction.

SVM analysis
We used data of 123 countries (shown in Appendix S1) to establish and test the SVM-based model. We used data of 99 According to the method of Liu, Zhuang, and Liu [17], we used the particle swarm optimization technique to choose the optimal parameters for the SVM model. The optimal parameters are as follows: r~1000, d~513, l~0:001. The EF model was then determined by these three parameters and the data of 99 countries. Following this, we used the model to calculate the EF of the other 24 countries. Model accuracy was measured by absolute and relative error. The calculation performance is displayed in Figure 1. The calculation results are presented in Table 3. Figure 1 and Table 3 show that the EF model based on SVM can calculate EF perfectly. The average absolute error is 0.004883, and the average relative error is only 0.351078%. Therefore, we were successful in establishing an EF model, and we can use it to calculate the EF of any nation using only five nationspecific variables.
According to Table 2, we can see that the product moment correlation between GDP (ln) and EF was 0.860. We constructed a least-squares regression model and obtained the following equation: The average absolute error from least-squares regression is 0.7620, and the average relative error is 44.66%. These are bigger than the errors derived from the EF model using the SVM technique with five variables. So, we can assume that the additional four variables are useful for the calculation of EF.

Implications, limitations, and future research
Our results demonstrate that national level per capita EF is influenced by the nation's GDP, urbanization level, distribution of income (measured with the Gini coefficient), export dependency (as a percentage of total GDP), and service intensity (as a percentage of total GDP). Using these five variables, we established an SVM model to calculate EF. Compared with the traditional technique, the SVM model required less variables, and had a quicker calculation time. Therefore, the SVM technique is very easy to apply. Despite the significant contributions of this study, it is subject to a number of limitations. First, this study used a cross-sectional rather than a longitudinal method. Much more emphasis was placed on observing national-level EFs than on observing changes in global EF. Much more emphasis should be placed on longitudinal research to focus on observing changes in EF behavior over time. Second, we only considered five factors that influenced per capita EF. In the future, we will explore other factors influencing per capita EF.
As a new approach to measuring sustainability, EF analysis has been more successful than others. Inevitably, the approach is not without its flaws [21,22]. However, its theory and application will be improved with continued study and with refinements the methodology used by organizations responsible for environmental reporting and management.

Supporting Information
Appendix S1 Countries analyzed in the study. (DOC)