The Asian Correction Can Be Quantitatively Forecasted Using a Statistical Model of Fusion-Fission Processes

The Global Financial Crisis of 2007-2008 wiped out US$37 trillions across global financial markets, this value is equivalent to the combined GDPs of the United States and the European Union in 2014. The defining moment of this crisis was the failure of Lehman Brothers, which precipitated the October 2008 crash and the Asian Correction (March 2009). Had the Federal Reserve seen these crashes coming, they might have bailed out Lehman Brothers, and prevented the crashes altogether. In this paper, we show that some of these market crashes (like the Asian Correction) can be predicted, if we assume that a large number of adaptive traders employing competing trading strategies. As the number of adherents for some strategies grow, others decline in the constantly changing strategy space. When a strategy group grows into a giant component, trader actions become increasingly correlated and this is reflected in the stock price. The fragmentation of this giant component will leads to a market crash. In this paper, we also derived the mean-field market crash forecast equation based on a model of fusions and fissions in the trading strategy space. By fitting the continuous returns of 20 stocks traded in Singapore Exchange to the market crash forecast equation, we obtain crash predictions ranging from end October 2008 to mid-February 2009, with early warning four to six months prior to the crashes.


Introduction
Today, the financial markets have become globally interconnected. Market players frequently maintain portfolios over many asset classes as an insurance against idiosyncratic risks. This raises concerns about susceptibility to systemic risks, and consequently the sustainability of the financial systems. This is worrisome, particularly for the central banks, regulators, and financial institutions. The Global Financial Crisis of 2007-2008 wiped out 34 trillion US dollars of value across financial markets around the world [1]. This is equivalent to the combined GDPs of the United States (US$17.42 trillions [2]) and the European Union (US$18.45 trillions [3]) in 2014. The defining moment of this crisis was the failure of Lehman Brothers in September 2008, which sent shock waves through financial markets around the world in the form of global market crashes namely the October 2008 crash and the Asian Correction (March 2009).
Had the Federal Reserve seen these crashes coming, they might have worked harder to bail out Lehman Brothers, and prevented the crashes altogether. Financial economists largely believe that financial crises are due to exogenous factors, and are hence not predictable. We feel that this opinion reflects more on the limitations of the models chosen by the economic community, rather than the task being fundamentally impossible. For instance, statistically stationary model [4][5][6] cannot produce booms and busts, whereas time series model with stochastic jumps [7][8][9] produces unpredictable booms and busts. Beyond statistical modeling, Palmer et al. introduced in 1994 the Santa Fe Stock Market Model [10], and showed that the artificial market can crash in the absence of exogenous shocks. We expect precursory signatures to exist for the endogenous booms and busts in the Santa Fe Stock Market Model, but no one has looked into this systematically. Alternatively, we can also try model-free forecasting of extreme market events using only empirical data [11]. The most successful of these is the Log Periodic Power Law (LPPL) method [12,13], which was originally developed by Didier Sornette in 1996 [14] to forecast earthquakes. As many have liken financial crashes to earthquakes [15][16][17], Johansen and Sornette fitted the LPPL to stock market indices, and found that the real market crashes times are very close to the finite-time singularities of the fitted LPPL. More importantly, they concluded that 25 out of the 49 crashes identified exhibit LPPL signatures and thus be classified as endogenous crashes [18].
Inspired by a Soup-of-Groups (SOG) model description of statistical fusion-fission processes in fault planes, Cheong et al. were able to forecast the timings, magnitudes, and locations of large Taiwanese earthquakes [19]. Furthermore, in a recent paper [20,21], we tracked robust clusters of stocks in the Singapore Exchange over 2008 and 2009, and found that these routinely merged to form larger clusters, and also disintegrated into smaller clusters. In particular, one of the clusters grew steadily into a giant cluster as we approached the October 2008 crash. As stock price movement is governed by market participants, this trading strategy clusters picture is similar to the "opinion convergence and divergence" proposed by Lin et al. [22]. Market participants read and react to market signals, during normal period they have diverse opinions, but they behave coherently when there are large price swings. These findings demonstrates that a financial market naturally self-organizes into clusters of strongly-correlated stocks (and trading strategy with similar opinions) that undergo fusions and fissions. Due to the coexistence of fusion-fission processes in both fault planes and trading strategies space, we wanted to duplicate our success with forecasting earthquakes to forecast financial crashes using the SOG model.

The Soup-of-Groups Model
The SOG model was first introduced by Neil Johnson et al. to study the emergent behavior in a vastly different complex systems. It was first applied to model the underlying dynamics of human insurgency [23], where terrorists are assumed to form groups that merge with each other in preparation for attacks, and disintegrate after a successful attack or to avoid detection by security forces. By comparing the model against a database of past terrorist attacks, Johnson et al. found that the terrorist groups are equally effective, and the number of casualties depends only on the group size, which is distributed as a power law with exponent α = 2.5. In the follow up paper [24] they suggested that the escalation rate and timing of the fatal attacks from terrorists follow naturally from SOG dynamics. In addition, Johnson et al. also tried to explain contagion dynamics on social networks in terms of the SOG model, particularly in the intermediate regime that relates individual behaviors to social group structures [25].
The SOG model, as the name suggests, consists of a "soup" (system) of "groups" (clusters) with various sizes. The essence of SOG dynamics is the fusion-fission processes among clusters. For instance, a cluster of size s i can merge with a cluster of size s j to give a cluster with size s i + s j at a rate of ν p , or a cluster of size s k can fragment completely into s k clusters with size 1 at a rate of ν b , see Fig 1(a) for illustration and refer to the S2 File for more details. In an infinite element equilibrium system, Neil Johnson et. al. discovered that the equilibrium distribution for the SOG model is an Exponential Truncated Power Law (ETPL) distribution [26] as

SOG Forecasting
In finance, one of the most widely used definition for quantifying financial returns is the simple return (and log return). However, the distribution of financial returns have been found to depend on the choice of time horizon, making such returns ambiguous [27]. To get around this ambiguity, we introduce continuous returns defined as the continuous price movement in the same direction (refer to the subsection Continuous returns in the Data and Methods section). This definition of an event agrees with that used by da Cruz and Lind [28]. In financial markets, market participants are heterogeneous, so even though they read the same market signals they may arrive at different trading strategies. However, if the market signal is strong, these traders may arrive at the same trading strategies, thereby reinforcing the market signals, Here, the selected cluster is colored green and the resultant clusters after fusion or fission process are colored blue. Each CR is due to a cluster fragmentation with a magnitude proportional to the cluster size.
In contrast, a fusion process generates no CR but makes the cluster bigger, and possibly into a giant cluster. In (c) we show the curve generated by Eq (2) that illustrates how the Integrated Continuous Return (ICR) change when a giant cluster is present and growing. Initially, in the absent of giant cluster, the ICR grows linearly due to fragmentation of clusters with an equilibrium distribution of sizes. However, when a giant cluster forms at the expense of other clusters, we start to get only fragmentations of smaller clusters and hence a slow down in ICR growth. Finally at t c where the giant cluster has exhausted all the resources, and reached the maximum allowed size, it must fragment completely in the next time step. leading to the formation of strategies blocks. When a strategy block is executed, market participants within the strategy block will react in the same way, hence generating a continuous return with a magnitude that depends on the size of the strategy block.
In this study, we focus on data from the Singapore Exchange (SGX) because we are familiar with stocks listed in the SGX. The SGX is also strongly coupled to financial markets around the world, and readily reflects the global market movement. We therefore test the SOG forecasting method for 20 component stocks of the Straits Times Index (STI) over the period January 2006 to December 2011, which is approximately three years before the October 2008 crash and three years after the October 2008 crash (read subsection Data in section Data and Methods). Since the SOG dynamics is universal, we expect it to be able to do forecasting with the model even when the stock price dynamics is only approximately SOG. To demonstrate this approximate SOG behavior we check the distribution of the continuous returns against the ETPL distribution. For comparison, we also fit the empirical data to the power law (PL) and asymptotic exponential (EXP) distribution using the method developed by Clauset et al. to estimate these parameters (x min ,â, andb) [29]. A detail description of the fitting procedure can be found in the S4 File. The fitting results from section Distribution of Continuous Returns in S3 File indicate that the continuous returns for 20 stocks are best fitted to the ETPL, as the PL tends to decay slower than the data, while EXP tends to decay faster than the data. Overall, the averagê a ETPL andb ETPL across all stocks is 1.82 ± 0.23 and 0.37 ± 0.08 respectively. The reportedâ ETPL is lower when compared with the SOG exponent of α = 2.5. Even though we can assume that the continuous return is directly proportional to the size of the cluster, yet the result shows this relationship does not hold. But, if we assume that a cluster of size s produces a continuous return of s z , where z = 2.5/α, then the SGX will indeed be a SOG system. We will use this modified scaling to formulate the forecasting equation (Eq (2)).
In the SOG model each continuous return is caused by a fragmentation of the strategy block, therefore a market crash is associated with the fragmentation of a giant cluster. Fig 1(b) illustrates the relation between the continuous return and the SOG fusion and fission process. During the fusion process two clusters merge and form a larger cluster, but generate no continuous return. On the other hand, each fission process will generate a continuous return. In these descriptions, a market crash is due to the fragmentation of a giant cluster (giant strategy block) that rapidly drives the price downwards. Thus if we can track the size and growth of the giant cluster we should be able to deduce the time when the market will crash. This can be done by determining the time when the giant cluster will reaches its maximum size. This time acts as the upper bound time for fragmentation of the giant cluster, thus the forecasted crash time. Unfortunately, we cannot directly observe the giant cluster size in the financial market, since we cannot simultaneously track the opinions and strategies of all market participants. Instead, we infer the giant cluster size based on the continuous returns, which becomes smaller as the giant cluster grows bigger and exhausts the larger clusters around it. This manifests itself as a slowing down in the growth of the integrated continuous returns.
In order to build the forecasting model, the inputs are: the size of the giant cluster s G (t), the size distribution of the rest of clusters f(s), as well as the relations between the magnitude of the continuous return and the cluster size CR t = CR t (s t ). We make several assumptions regarding these inputs. First, we assume that the giant cluster grows linearly with time as s G (t) = γt. Second, instead of using the ETPL distribution, which is difficult to work with, we use the simplified PL distribution f(s) = As −2.5 . Lastly, we assume that the continuous return is proportional to the cluster size, given as CR t / s z t in order for the dynamics of continuous return to be approximately SOG. Based on these assumptions, we can write down the expected integrated continuous returnÎCRð0; tÞ from time = 0 to time = t aŝ Assuming the giant cluster started growing at t o , Eq (1) becomeŝ Eq (2) is a mean-field forecasting equation that we show in Fig 1(c). From Fig 1(c), we observe that before the giant cluster started growing,ÎCRðtÞ grows linearly with time. As the giant cluster grows, the growth rate ofÎCRðtÞ decreases. More importantly, a singularity occurs when the giant cluster reaches the maximum size of S G (t c ) = S max = γ(t c − t o ). We call this time the forecasted crash time (t c ) which is From the data, we define the empirical integrated continuous returns from t Start to t End as which is the cumulative sum of the continuous returns. We fit the empirical integrated continuous returns ICR(t Start , t End ) to our forecasting modelÎCRðt Start ; t End Þ (Eq (2)) to obtain the forecasted crash time t c . For the fitting, we use the non-linear least square method in MATLAB to estimate the parameter set that minimizes the overall residuals. In practice, we must only use data available up to the present moment. Therefore to mimic a real-time forecasting using historical data, we work with dynamic fitting windows starting at t Start and ending at t End , where t End must be before the market crash we want to 'forecast' . To do this, we fix a t Start then we start with t End one month forward and increase t End 2 weeks at a time. We then obtain t c as a function of t End for a given t Start , t c = t c (t End |t Start ).

Results
Our objective is to use the SOG model to detect early warnings of market crashes by forecasting the crash times of the STI component stocks. This is a different approach compared to traditional finance and econometric models, wherein sudden changes are assumed to be brought about by exogenous shocks. In the SOG model, a market crash is endogenous, driven by the giant cluster formation and its fragmentation. Within the period that we study, from Jan 2006 to Dec 2011, two global financial crashes left their marks on the SGX. As shown in S Fig 1 in   From the graph, we can see that when far before the October 2008 crash, t c is increasing linearly with t End . This seems to suggest that the SGX is always critical, and ready to crash. This agrees with the suggestions that stock markets are self-organized critical (SOC) systems [30,31], as well as findings from calibrating agent-based models in stock markets [32].
As we approach the October 2008 crash, we see t c growing at a slower rate and finally stagnating around the actual crash time (t Act = 27 Oct 2008). This stagnation occurs a few months before t End reaches t Act . This is an interesting early warning signature that we must look out for in a system that is already bordering on criticality. This result gives a positive outlook for applying the SOG model to forecast market crashes, but rigorous tests are needed to establish the forecasting power of the SOG model.

Discussion
We perform a sensitivity test on the forecasting results by comparing the forecasted crash time t c with the October 2008 crash, actual crash time t Act = 27 Oct 2008. Given a t Start , t c is a function of t End as t c = t c (t End |t Start ). We can test the sensitivity of t c (t End |t Start ) to t End using 14 t Start 's at the beginning and middle of the months from 1 Nov 2006 to 16 May 2007. We compare the weighted mean of forecasted crash time, " t c ðt End Þs with the actual crash time t Act = 27 Oct 2008 at 95% confidence level, using the null hypothesis H o : " t c ¼ t Act (see the subsection on Sensitivity analysis in Data and Methods section). Robust signatures for the market crash forecast can be observed in Fig 3: " t c increases linearly with t End when we are far from the market crash, and thereafter stagnates around t Act when t End is closer to the October 2008 crash. For earlier t End 's the predicted " t c is statistically different from t Act , whereas for t End close to the actual crash there is no longer statistical difference between " t c and t Act . The t End when t c first becomes statistically indistinguishable from t Act is the warning time, t w which in turn acts as the early warning signal for market crashes. In addition, in order to determine the accuracy of the SOG forecasting result, we calculate the average of the weighted mean of forecasted crash time h " t c ðt End Þi: average of " t c ðt End Þ for the t End 's between the warning time t w and the actual crash time t Act (27 Oct 2008). If the h " t c ðt End Þi calculated is close to t Act , it means the forecasting result is accurate. The results for all 20 component stocks are listed in the S Table 1 in S5 File, there is a range of four to six months of early warning prior the actual crash. Apart from the component stock DBSM.SI with h " t c ðt End Þi falling on 29 Oct 2008, the others predicted crash dates are after t Act , as late as 12 Feb 2009. This is to be expected, as the SOG model predicts the latest possible crash time. However, in reality it is possible for a giant cluster to disintegrate before it reaches its maximum size.
In addition to the latest possible crash date for individual stocks that we are tracking, we can also combine the forecasting results from all stocks into an index to estimate the market-level risk. For example, when the majority of the STI component stocks predict a crash simultaneously on a particular date, the market risk heightens on that particular date. In order to do this cross sectional study, we first count the number of STI component stocks forecasted to crash on a particular day, for various t Start 's and t End 's. Going back to Fig 2, we observe that when we are far from the actual crash, the forecasted crash date increases linearly with t End and thus we get a uniform background count. In contrast, when t End is close to t Act , t c stagnates leading to a high concentration of forecasted crashes in a narrow range of dates and hence a large number of counts on those dates. We show these counts as a heat map to visualize the market risk, where a date with large counts hence high market risk is shown as a red pixel. In contrast, a blue pixel means background level of counts with low market risk.
On the left of  (15 Sep 2008). We believe this is due to the October 2008 crash and the Asian Correction appearing as a single extended market crash to the forecasting model. More interestingly, early warning signals did appear a few months before the October 2008 crash, thus providing us the opportunity to prepare for the financial market crashes.
All in all, these results show that the SOG forecasting model provides early warning in the form of slowing down of the integrated continuous returns. Furthermore the SOG model is also able to indicate the tension level for potential financial crashes via the market risk heat map. Together with other standard critical slowing down indicators like increasing volatility, increasing autocorrelation, red shift in power spectrum, and increasing cross-correlation [33][34][35], the combination of indicators can act as financial crisis early warning alarm for authorities, such that measures can be taken to prevent or at least soften the impact of financial crashes. Still, there is room for improvement, for example the false negative error for the October 2008 crash compared to Asian Correction. We do not expect that the SOG model to be the best model for describing market fusion-fission dynamics, particularly the complete fragmentation required by the SOG model. We believe partial fragmentation of clusters will be more realistic, and we are working to the test forecasting method based on SOG-like models. In summary, the SOG model forecasting is promising enough to merit further exploration.

Data
In this study, we focus on data from the Singapore Exchange (SGX) because we are familiar with stocks listed in the SGX, in contrast to model markets (e.g. the London Stock Exchange (LSE) and the New York Stock Exchange (NYSE)) that are often studied by econophysicists. Although the SGX is an emerging market, it is strongly coupled to financial markets around the world, and readily reflects the global market movement. For instance, the Straits Times Index (STI) is highly correlated with the Dow Jones Index (ρ % 0.91), and slid more than 45% between August 2008 and October 2008 as the result of the October 2008 crash (see S Fig 1 in S1 File). The STI is made up of the top 30 stocks in SGX, but these components do change from time to time. Thus we only consider 20 component stocks that remain highly traded across January 2006 to December 2011. See S Table 1 in S1 File for the full list of these 20 stocks.
The tick-by-tick data was downloaded from the Thomson Reuters Tick History database (http://thomsonreuters.com/tick-history). On an average the number of transactions per stock is on the order of 10 6 over 1506 trading days. Unlike LSE and NYSE, the SGX is relatively illiquid as the median time interval between transactions ranges from 6 s for the most liquid stock to 47 s for the least liquid stock (refer to S Table 1 in S1 File).

Continuous returns
We define continuous return as the continuous price movements in the same direction, and mathematically the continuous return CR t(m) is given as CR tðmÞ ¼ j P tðmþnÞ À P tðmÞ j P tðmÞ ; ð5Þ that satisfies all conditions simultaneously ðP tðmÀ 1Þ À P tðmÞ Þ Â ðP tðmþnÞ À P tðmþnþ1Þ Þ > 0; ðP tðmÀ 1Þ À P tðmÞ Þ Â ðP tðmÞ À P tðmþkÞ Þ < 0; ðP tðmþnÞ À P tðmþnþ1Þ Þ Â ðP tðmþnÀ kÞ À P tðmþnÞ Þ < 0; where P t(i) is the i th transaction price occurs at time t(i), k 2 1, 2, 3 . . . n, and at time t m+n the direction of the price movement changes. S Fig 3 in S3 File illustrates a segment of stock price time series, where the continuous return is the fractional price change between two red vertical lines, denoting the start and end of a microtrend [36]. Over the study period, Jan 2006 to Dec 2011 the number of continuous returns on average across stocks is on the order of 10 4 , and the median of microtrend for different stocks ranged from 10 mins to 50 mins (see S Table 1 in S3 File).

Sensitivity analysis
We perform a sensitivity test of the forecasting results by comparing the weighted mean forecasted crash time " t c with the actual crash time t Act . Given a t Start , t c is a function of t End as t c = t c (t End |t Start ). We can test the sensitivity of t c (t End |t Start ) to t End , by calculate the weighted mean of t c at particular t End using various t Start 's. The weighted mean and weighted standard deviation of t c (t End ) is then calculated for each respective t End as In Eq (6) the NumDays(t 1 , t 2 ) is the number of trading days between t 1 and t 2 , and weight wðt i Start jt End Þ is introduced to take into account the different amount of data being used to do the forecast. We compare " t c ðt End Þ with the t Act at 95% confidence level, using the null