A global prediction model for sudden stops of capital flows using decision trees

Capital flows is an important aspect of the international monetary system because they provide great direct and indirect benefits, and at the same time, they carry risks of vulnerability for countries with an open economy. Numerous works have studied the behavior of these flows and have developed models to predict sudden stop events. However, the existing models have limitations and the literature demands more research on the subject given that the accuracy of the models is still poor, and they have only been developed for emerging countries. This paper presents a new prediction model of sudden stop events of capital flows for both emerging countries and developed countries with the ability to estimate accurately future sudden stop scenarios globally. A sample of 103 countries was used, including 73 emerging countries and 30 developed countries, which has allowed the use of sample combinations that consider the regional heterogeneity of the warning indicators. To the sample under study, a method of decision trees has been applied, which has provided excellent prediction results given its ability to learn characteristics and create long-term dependencies from sequential data and time series. Our model has a great potential impact on the adequacy of macroeconomic policy against the risks derived from sudden stops of capital flows, providing tools that help to achieve financial stability at the global level.


Introduction
Sudden Stop (SS) is a sharp contraction of international capital flows. SS have significant negative effects on the global economy and as such have received special attention in the existing literature. [1] finds that SS lead to a drop in GDP growth of approximately 4%. [2] and [3] show that SS are accompanied by significant drops in production and employment. The financial crises caused by SS have a significant negative impact on output growth compared to currency crises. A currency crisis usually reduces output by about 2-3%, while a SS reduces output by an additional 6-8% in the year of the crisis [4]. countries have not been immune to the phenomenon, having experienced more than ten crises in the last decade.
The recent literature contains a number of models for predicting SS [5][6][7]. These models have shown a preceding capital boom to be a good predictor of SS [8][9][10], together with increases in capital inflows accompanied by weak economic data and appreciated real exchange rates or high current account deficits [5,11,12]. However, the explanatory power of these models notwithstanding, there remain specific limitations in terms of their level of accuracy and their focus on groups of emerging countries. Hence, there is a need for further research on the problem, specifically in terms of new models that provide a better fit and extend the research problem beyond emerging countries to encompass developed countries [7,13].
To help make models for SS prediction more robust, this study has developed a new global model for predicting SS. The model has predictive capacity for all countries, with accuracy levels above 85%. The model was developed using a sample of 103 emerging and developed countries and uses artificial intelligence (AI) techniques applied to decision trees, a transparent decision-making mechanism. As such, our new model incorporates one of the requirements of the European Union recommendation on trustworthy AI, namely that technologies are robust, safe and transparent [14]. The guidelines recommend the incorporation of requirements to ensure trustworthy AI from the earliest design phase. These requirements are accountability, data governance, design for all, governance of AI autonomy (human oversight), non-discrimination, respect for human autonomy, respect for privacy, robustness, safety, transparency. The results obtained provide a significant improvement on the accuracy of previous models and contribute to literature on SS, providing experience for developed countries.
This study is structured as follows: Section 2 provides a literature review of empirical research on SS. Section 3 sets out the methodology used. Section 4 provides details of the data

Literature review
Existing literature on SS has three main lines of research. Firstly, there is analysis of the impacts of SS in different countries [15,16]. Secondly, there is the work that establishes a connection between increasing capital flows and SS [9,17,18]. Finally, there are studies that have developed models for predicting SS [5,6,7].
Studies that have made the connection between increased capital flows and SS include [17], which analysed whether countries that limited international capital inflows were less likely to experience SS using a Probit model. The study found that high capital mobility is statistically significant and positive, with a small direct effect on the likelihood of a country experiencing a sharp drop in net capital inflows. In contrast, [18] found that countries that trade less with other countries have a greater propensity to SS and a currency collapse. In a study of 38 emerging economies covering the period 1990-2003, [5] found that a wave of capital inflows significantly increases the probability of SS. Moreover, the effect tends to be stronger if there is a large current account deficit or the real exchange rate has appreciated. [9] evaluated the effect of the "overreaction" in the stock market, showing that financial markets reacted excessively to new information or unexpected events. One of the main conclusions is that an upward overreaction subsequently causes a dramatic downward adjustment, in addition to an inexplicable high level of volatility in emerging countries. Subsequent research has also shown that after increased investment in emerging markets due to a significant rise in capital flows, numerous countries suffered a crisis with a large and unexpected reversal of capital flows [21].
The studies that have developed predictive models include [5]. After analysing 38 emerging markets in the period 1990-2003 using a Probit model, the study concluded that the relevant variables for predicting SS were Current Account to GDP and Real Exchange Rate. It obtained an accuracy of around 72%. [6] used a Logit model based on a sample of 43 countries for the period 1970-2009, separating the capital flows of countries into four components. They found that this separation helped better understand recent financial crises and improved prediction of SS compared to the standard two-way breakdown, with an accuracy of around 68%. The authors also concluded that the most significant variables for prediction were Current Account to GDP and Domestic Credit to GDP. Subsequently, [7] proposed a new model combining the two conventional approaches (signal extraction and logistic regression) to predict SS in emerging countries. The study identified the phenomenon of SS with Capital Flows to GDP Ratio, based on a sample of capital flows from 48 emerging countries for the period 1971-2014. The results showed that the model significantly improves predictive capacity and found that Current Account to GDP, External Debt to Exports Ratio, Terms of Trade, Real Exchange Rate and M2 -International Reserves Ratio were significant variables. Despite its results, the study found that the accuracy of the model (70%) could be significantly improved.

Methodology
Decision Tree and C5.0 algorithm A decision tree (DT) is a graphical and analytic technique for classifying data in terms of different possible paths. Each node of the tree represents the different attributes of the data. The branches of the tree represent the possible paths to follow to predict the class of a new example.
Finally, the terminal nodes or leaves establish the class of the test example in line with the branching in question. The notation used for describing the DTs is disjunctive normal form (DNF). Hence, if we have three attributes (A, B and C) each with two possible values x i and ¬x i , where i = 1, 2, 3, there are 2 n possible combinations in DNF (n is the number of attributes). Each of the DNF combinations describes a part of the tree, giving the disjunctive forms expressed in Eq (1) for the tree.
ðx2 L :x3Þ V ðx2 L x3Þ V ð:x2 L x1Þ V ð:x2 L :x1Þ ð1Þ These disjunctions are descriptors of the tree that has been built. Thus it is possible to form 2 2n possible descriptions in DNF. Given that the order of DTs is extremely large, it is not possible to explore all the descriptors to identify the most adequate. Instead, heuristic search techniques are used to do this easily and quickly. The majority of the algorithms for constructing DTs are based on the Hill Climbing strategy. This is an AI technique used to find the maximums or minimums of the function via a local search. The algorithms begin with an empty tree, which is then segmented into sets of examples, in each case choosing the attribute that best discriminates between the classes until completing the tree. A heuristic function is used to find the best attribute and the choice is irrevocable, meaning it is important to ensure it is as close as possible to the optimal. The main advantage of using this type of strategy is the low computational cost.
There are various algorithms for building DTs. [22], which develops the so-called ID3 algorithm, is regarded as the seminal work in the field. The algorithm uses the notion of entropy to check the randomness of the distribution of a set of examples over the classes to which they belong. The C4.5 algorithm is an extension of ID3 and has the advantage of extracting hidden information from large datasets and providing classification rules with a high level of accuracy [23]. It builds a DT using partitions together with data. This construction is carried out using a "depth-first" strategy (all possible tests are used to divide the available data, selecting the one with the largest information gain). For each discrete attribute, a test with n possible results is considered In contrast, if the attributes are continuous, a single binary test is used for each of the values that the attribute can take. Every time a node is generated, the algorithm chooses the test as a function of the information gain provided, as expressed in (2).
where S is the set of cases, A is the attributes, n is the partition number of attribute A, and S i is the number of cases in the i-th partition. The entropy value is determined using Eq (3).
where n is the number of partitions of S and p i is the proportion of S. The C5.0 algorithm used in this study is a new-generation machine learning algorithm (MLA) based on DTs [24]. This means that DTs are built from the list of possible attributes and the set of training cases. The DTs can then be used to classify the remaining sets of test cases. The C5.0 algorithm offers a number of significant advantages over C4.5, since the rules generated are more accurate and it takes less time to generate them [25].

Sensitivity analysis
Despite the significant explanatory capacity of DTs, when a large number of variables are used, it is also necessary to quantify their impact. This is done via the sensitivity analysis. This analysis aims to determine the relative importance of the independent variables in relation to the dependent variable [26]. It seeks to reduce the models to the most important variables and ignore or eliminate the least important. One variable is considered more important than another if it increases the variance, compared to the set of variables of the model. The Sobol method [27] is used to decompose the variance of the total output V(Y) provided by the set of equations expressed in (4). Where For its part, the sensitivity indexes are determined by Si ¼ Vi V and Sij = Vij/V, where Sij indicates the effect of interaction between two factors. The Sobol decomposition allows the estimation of a total sensitivity index STi, which measures the sum of all the sensitivity effects involved in the independent variables.
The DT methods used in this study are an appropriate measure of the sensitivity of the variables and are shown are shown in S1 Table.

Sample and data
The chosen sample period is 1960-2016 for each of the three SS definitions specified above. The annual capital flow data has been used to identify SS events in 103 countries (73 emerging countries and 30 developed countries), permitting the construction of nine SS prediction models. Data from the IMF International Financial Statistics (IFS) and the World Bank has been used to classify countries and obtain information on the independent variables. Sudden stops events by country (global, emerging and developed) are shown in S1-S4 Figs, respectively. Sudden stops events by year (global, emerging and developed) are exhibited in S5-S7 Figs, respectively.
The sample data set has been divided into three groups mutually exclusive, one for training (70% of the data), another for validation (10% of the data) and a third group for testing (20% of the data). As is well known, the validation data is used to evaluate the decision tree during training, and to detect an over-training of it. If the error for the validation grows during a certain number of training times, the training is stopped. For its part, the testing data is used to evaluate the built model and make predictions. The percentage of correctly classified cases (accuracy) and the root of the mean square error have been used for the evaluation. Furthermore, for the treatment of each of the three groups, the 10-fold cross validation procedure has been applied with 500 iterations [28,29].

Variables
The dependent variable used in this study is sudden stops of capital flows and it is a dummy variable, Sj,t, that takes a value of one for the occurrence of the SS events and zero otherwise for country j (j = 1, J) and at time t, that is expressed in (5).
( where ΔCF j,t denotes Capital Flows to GDP Ratio, DCF j;t indicates the historic average and s DCF j the standard deviation. We used 36 independent variables as possible predictors of SS (Table 1). These are standard variables used in the existing literature [7,9,30,31], and they are classified according to their attributes (macroeconomic, financial, external, global and cross-country). The predictors have been built in period (t-1) regarding the measure of SS, that are in period t.

Descriptive statistics
The SS transition matrices in Figs 3 and 4 show the number of SS events in the three scenarios (SS1, SS2 and SS3) for each country category. In scenario SS2, emerging countries have experienced more SS events. Developed countries have experienced more SS events in scenario SS3. Comparing the sample of countries, the proportion of an SS event occurring in the three situations is in the range 18.48-3.56%. Tables 2, 3 and 4 provide a statistical summary of the independent variables for emerging countries, developed countries and the overall sample (global). The average values for emerging countries are generally higher than those for developed countries. For example, the Real GDP Growth of emerging countries is 6.258%, compared to 3.131% for developed countries. This suggests that the economies of emerging countries are experiencing full economic development and, as such, grow faster than in the developed world. Similarly, the average External Debt to Export Ratio shows that emerging countries have a limited capacity to fund their external debt with exports. Furthermore, among the cases of negative values, the average value of the variable Capital Control in emerging countries is noteworthy, reflecting the fact that the quantity of products exported often fails to cover the quantity of products imported, in monetary terms.

Correlations
Tables 5, 6 and 7 provide the correlations between variables (dependent and independent) for emerging countries, developed countries and the overall sample (global). Correlations varied within a range between -0.312 and 0.322 in emerging countries, -0.328 and 0.332 in developed countries, and -0.228 and 0.262 in the global sample. According to the results obtained we can comment for example that SS3 has a high correlation with EXDEBT, CREDIT and GDEBT in emerging countries, developed countries and global sample, respectively.

Empirical results
Tables 8, 9 and 10 and Figs 5, 6 and 7 show the accuracy level, root mean square error (RMSE), the model selection criteria, the ROC curve value and the variables with the greatest sensitivity for each of the models produced. In all cases, accuracy is higher than 87.12% and both the RMSE levels and ROC values are adequate. The model with the highest accuracy value (93.89%) is for developed countries in SS1, followed by the model for emerging countries in SS1 (93.02%). As a whole, these results provide a much higher level of accuracy than previous research. The accuracy in the study by [7] is around 70% and the figure for the study by Janus and Riera-Crichton [6] is around 68%. Figs 8 and 9 provide additional information on the greater sensitivity variables (see also S1 Table). The variables M2 and VIX are significant in seven models (twice in SS1, three times in SS2 and twice in SS3). The variables STOCK and CREDIT, both from the subgroup of financial variables, are repeated in six models (STOCK once in SS1, three times in SS2 and twice in SS3; and CREDIT twice in SS1 and twice again in both SS2 and SS3). The variables EXREG and GDEBT appear in five models (EXREG three times in SS1 and once in both SS2 and SS3; and GDEBT once in SS1 and twice in SS2 and SS3). Other variables (DRINT, CA, RGDP,    The variable DRINT appears once in SS1 and SS3, and twice in SS2. The variable CA appears three times in SS2 and once in SS3. Finally, the variable RGDP is significant once in SS1 and SS2 and twice in SS3. The results show that for emerging countries, the significant variable that appears in the three SS scenarios is EXREG (subgroup of cross-country variables). Moreover, the variables INFLA, WDGP, STOCK, CA, EXDEBT, FRES and VIX are repeated twice. Compared to other previous research, the variables CA, EXDEBT, TOT, RER and FRES are also significant in the study by [7]. Similarly, CA and CREDIT were significant in the study by [6], and CA and RER in the study by [5]. This shows that our research has validated new significant variables in the macroeconomic, financial, global and cross-country subgroups (EXREG, INFLA, WDGP, STOCK and VIX) and thus identifies a new set of significant variables that differ from previous research.
The results for the three models for developing countries show that DRINT, M2, STOCK, VIX, RGDP, GDEBT and CREDIT are the variables with greatest sensitivity for predicting SS. Hence, developed countries must be alert to the behaviour of these variables, since high real interest rates, public debt to GDP, M2 growth, the level of domestic credit and the volatility index are all linked to a higher probability of an SS event. Likewise, higher GDP growth and the performance of the stock index are negatively related to the possibility of an SS event. Since there is no previous research on forecasting specifically for developed countries, the results of this research represent an innovative contribution to the literature on SS.
Finally, regarding the results of the global models, it can be deduced that the variables with greatest sensitivity for predicting SS in the three scenarios considered are GDEBT, M2,

Post-estimations
Using multiple-step-ahead prediction, we have considered the iterative strategy, where models that are trained for the prediction of 1-step forward are developed [32]. At time t, a prediction is made for moment t+1, and this prediction is used to make the prediction for moment t+2 and so on. This means that the predicted data for t+1 are considered real data and are added to the end of the available data [33]. Table 11

Conclusions
This study has developed new models for predicting SS for emerging countries, developed countries and a global sample of countries. It has applied DTs as an innovative method not used in previous research in the field. Specifically, the goal has been to improve the predictive accuracy of previous studies using different methodologies and increase the sample size to all countries in the world. The results obtained in this research are significantly higher than those obtained in the existing literature, with an accuracy range of 87.12-93.89%. Our improvement in accuracy may also be due to greater coverage of years and countries of our sample with respect to other previous works, and this should also be considered for future work. It has also detected new significant variables in these SS prediction models, allowing a high level of stability in the models developed over forecasting horizons t+1, t+2 and t+3. In contrast to previous research, this study has been able to expand predictions of SS events beyond emerging countries to the global level. The results have identified different significant variables for emerging countries and developed countries, as well as at the global level. This makes an essential contribution to the field of international finance. The conclusions are relevant to agents responsible for economic policy in any country in the world, since our study suggests new explanatory significant variables to allow political agents to predict SS phenomena. This research has also provided a new SS forecasting model developed using DTs, thus contributing to existing knowledge in the field of AI. This new model can be used as a reference for setting macroeconomic policy and improved decision-making. In summary, this study provides a significant opportunity to contribute to the field of finance, since the results obtained have significant implications for the future decisions of political agents, making it possible to avoid SS events and the potential associated costs. It also helps these agents send warning signals to financial markets and avoid financial crises derived from the phenomenon of SS.
A limitation of this study is that there are possible cases of countries that have changed from a situation of emerging countries to developed countries in our sample period. In this work we have not taken it into account for the purpose of greater homogeneity, but could be future lines of research.
Opportunities for further research in this field include developing predictive models taking into account political factors that evaluate the possible influence of the management and effectiveness of economic policy on the phenomenon of SS.