Dynamic Evolution of Cross-Correlations in the Chinese Stock Market

The analysis of cross-correlations is extensively applied for the understanding of interconnections in stock markets and the portfolio risk estimation. Current studies of correlations in Chinese market mainly focus on the static correlations between return series, and this calls for an urgent need to investigate their dynamic correlations. Our study aims to reveal the dynamic evolution of cross-correlations in the Chinese stock market, and offer an exact interpretation for the evolution behavior. The correlation matrices constructed from the return series of 367 A-share stocks traded on the Shanghai Stock Exchange from January 4, 1999 to December 30, 2011 are calculated over a moving window with a size of 400 days. The evolutions of the statistical properties of the correlation coefficients, eigenvalues, and eigenvectors of the correlation matrices are carefully analyzed. We find that the stock correlations are significantly increased in the periods of two market crashes in 2001 and 2008, during which only five eigenvalues significantly deviate from the random correlation matrix, and the systemic risk is higher in these volatile periods than calm periods. By investigating the significant contributors of the deviating eigenvectors in different time periods, we observe a dynamic evolution behavior in business sectors such as IT, electronics, and real estate, which lead the rise (drop) before (after) the crashes. Our results provide new perspectives for the understanding of the dynamic evolution of cross-correlations in the Chines stock markets, and the result of risk estimation is valuable for the application of risk management.


Introduction
The stock market is a typical complex system with interactions between individuals, groups, and institutions at different levels. In financial crises, the risk can quickly propagate among these interconnected institutions which have mutual beneficial business. Therefore, the analysis of the correlations between shares issued by different institutions is of crucial importance for the understanding of interactive mechanism of the stock market and the portfolio risk estimation [1][2][3]. Variety of works have been done to reveal the information contained in the internal correlations among stocks, and the methods generally used in the research of stock crosscorrelations include the random matrix theory (RMT) [4,5], the principal component analysis (PCA) [6][7][8], and the hierarchical structure [9][10][11][12][13][14][15][16][17].
The random matrix theory (RMT), originally developed in complex quantum system, is applied to analyze the crosscorrelations between stocks in the U.S. stock market by Plerou et al. [4]. The statistics of the most of the eigenvalues of the correlation matrix calculated from stock return series agree with the predictions of random matrix theory, but with deviations for a few of the largest eigenvalues. Extended work has been conducted to explain information contained in the deviating eigenvalues [18], which reveals that the largest eigenvalue corresponds to a marketwide influence to all stocks and the remaining deviating eigenvalues correspond to conventionally identified business sectors. Additional work has proved that even the eigenvalues within the spectrum of RMT carry some sort of correlations [19,20]. Using the same RMT method, extensive works have been performed in the correlation analysis of various stock markets [21][22][23][24][25][26][27][28][29][30].
In recent years, there are increasing works concentrated on the variation of the cross-correlations between market equities over time [31][32][33][34][35][36][37][38][39][40]. Aste et al. have investigated the evolution of the correlation structure among 395 stocks quoted on the U.S. equity market from 1996 to 2009, in which the connected links among stocks are built by a topologically constrained graph approach [34]. They find that the stocks have increased correlations in the period of larger market instabilities. By using the similar filtered graph approach, the correlation structure among 57 different market indices all over the world has been studied [37]. Fenn et al. have used the RMT method to analyze the time evolutions of the correlations between the market equity indices of 28 geographical regions from 1999 to 2010 [38], and they also observe the increase of the correlations between several different markets after the credit crisis of 2007-2008. Similar results have also been observed in Refs. [31,32,35,41,42].
The RMT method has been applied to the analysis of the static correlations between the return series in the Chinese stock market [26]. No clear interactions between stocks in same business sectors  are observed, while unusual sectors containing the ST (specially treated) and Blue-chip stocks are identified by a few of the largest eigenvalues. Further work has been done to analyze the anticorrelated sub-sectors that compose the unusual sectors [43]. Up to now, not much work has been conducted on the dynamics of stock correlations in the Chinese market to the best of our knowledge. Using the daily records of 259 stocks on the Chinese stock market from 1997 to 2007, the dynamic evolution of the   (3). The inset shows the largest eigenvalue l 1 of the empirical return series, which is much larger than the upper bound l max of RMT. doi:10.1371/journal.pone.0097711.g005 Figure 6. Comparison between the eigenvalues of empirical correlation matrix and surrogate random correlation matrix. The black circled line is the 99th percentile of the eigenvalues of the shuffled return series. The return series in each moving window is randomized by shuffling for 10 times. The red squared line is the number of the empirical eigenvalues significantly larger than those of the shuffled data, which are the eigenvalues larger than the 99th percentile of the eigenvalues of the shuffled data. Chinese stock network was firstly analyzed in [36]. In their work the links are constructed between the stocks which have correlations larger than a threshold, and a stable topological structure is revealed by using a dynamic threshold instead of the static threshold. Although additional efforts are made to identify the economic sectors based on the RMT method, the dynamic effects of conventional business sectors is extremely weak. The principal component analysis (PCA) is another method commonly used to detect the correlations between stock returns. It is closely related to the RMT method, since it is also done through eigenvalue decomposition of the correlation (or covariance) matrix of the return series. This method uses an orthogonal transformation to convert a set of possible correlated returns into several uncorrelated components, which are ranked by their explanatory power for the total variance of the system. The studies of correlations among stock returns based on the PCA method are primarily concerned about the systemic risk measures [6][7][8].
In this paper, by mainly using the RMT method, we study dynamic evolution of the correlations between the 367 A-share stocks traded on Shanghai Stock Exchange from 1999 to 2011. The internal correlations between the stocks are investigated based on the correlation matrix of the return series of individual stocks in a moving window with a fixed length. We are mainly concerned about the statistical properties of the correlation coefficients, eigenvalues and eigenvectors of the correlation matrix, and their variations in different time periods. Our results confirm the strong collective behavior of the stock returns in the periods of market crashes, which is verified by the observations of the distribution of the correlation coefficients and the mean correlation coefficient. Further, based on the PCA method we calculate the proportion of total variance explain by the first n components, through which the systemic risk of the Chinese stock market is estimated for different time periods. Another important purpose of our study is to extract the information contained in the eigenvectors deviating from RMT. We find the largest eigenvector quantifies a marketwide influence on all stocks, and this market mode remains stable over time. For the interpretation of other deviating eigenvectors, dynamic evolutions of several conventional industries including IT, electronics, machinery, petrochemicals, and real estate, are remarkably observed.

Materials and Methods
The database analyzed in our study contains the daily data of all A-Share stocks traded on Shanghai Stock Exchange (SHSE), one of the two stock exchanges in mainland China. The A-Share stocks are issued by mainland Chinese companies, and traded in Chinese Yuan. The data source is from Beijing Gildata RESSET Data Technology Co., Ltd, see http://www.resset.cn/. To better understand the correlation structures under different market conditions, we select the A-share stocks traded on Shanghai Stock Exchange from January 4, 1999 to December 30, 2011 covering the two big crashes in 2001 and 2008. To make sure that the stocks have enough number of trading days to be statistically significant in our studies, we select the stocks traded on the stock exchange for at least 2600 days, i.e., exclude those stocks suspended from the market for more than about two years. This filter yields the sample data including 367 A-Share stocks and 1114364 daily records in total.
Before we quantify the cross-correlations between stocks, we first calculate the return series for a given stock i as where p i (t) is the price for stock i at time t, and t is in units of one day. The Pearson's correlation coefficient between two stock return series G i (t) and G j (t) is defined as where s i and s j are the standard deviations of two stock return series. It is a common measure of the dependence between the return series of the two stocks. There are N~367 sample stocks, therefore we have a correlation matrix C with 367|367 correlation coefficients as elements. The elements of the correlation matrix are restricted to the domain {1ƒc ij ƒ1: for 0vc ij ƒ1 the stocks are correlated, for {1ƒc ij v0 the stocks are anticorrelated, and for c ij~0 the stocks are uncorrelated. The cross-correlation defined above is to calculate the dependence between the return series in the whole period of the sample data. We are more interested in the dynamic variation of the stock correlations evolved with time t, so we look at the correlations calculated over a moving window. The size T of the moving window is fixed to be 400 trading days, i.e., about two years, which is a little bit larger than the number of the sample stocks. Equation (2) is applied to calculate the correlation coefficients over a subset of return series within the moving window ½t{Tz1,t. For instance, the correlations in the first moving window are computed by the return series within ½1,T, and ½2,Tz1 for the following moving window. In consideration of our sample date, which is from 04/01/1999 to 30/12/2011, the starting date of the moving window covers the period from 04/01/ 1999 to 12/05/2010, and the ending date is from 06/09/2000 to 30/12/2011.

Dynamics of correlation coefficients
We first analyze the distribution of the elements c ij of the correlation matrix to capture the statistical properties of the correlation coefficients. In Figure 1, the probability density function (PDF) P(c ij ) of the correlation coefficients evolved with time t is shown. We observe that the center of the distribution clearly deviates from zero for the whole range of t. The values of the coefficient c ij , at which the peaks of P(c ij ) are located, are significantly positive and vary over t. The peaks of P(c ij ) show two local maxima of c ij as t approaches 2003 and 2009, and appear at relatively small c ij for other t.
The Chinese stock market suffered a big crash after the release of the policy of state-held shares sale in listed companies in 2001, and the collapse of the internet bubble also took place in 2000-2001. In 2008, the global financial crisis burst out, and hit the stock markets around the world, certainly including the Chinese stock market. Considering that the length of the moving window is about two years, the correlations between the stock returns are significantly increased in the time windows 2001-2003 and 2008-2009. This indicates that stock price variations are more likely to be correlated around the market crashes.
To further verify the dependence of the stock correlations on the time t, we compute the mean correlation coefficient Sc ij T in the

Dynamics of eigenvalues and their explanations of system variance
We compute the eigenvalues of the correlation matrix C with N|N elements, and denote them as l k , k~1, Á Á Á ,N, and l 1 wl 2 w Á Á Á wl N . We investigate the probability density function (PDF) of the eigenvalues and its variation over time t. In Figure 3, the PDF P(l) for lƒ20 evolved with t is plotted. The peaks of P(l) show larger values for t around 2003 and 2009 than those for other t. The P(l) for large eigenvalues lw20 is plotted in In the observation of P(l), we note that there exist large eigenvalues obviously large than the eigenvalues of the random correlation matrix. To compare the difference between the eigenvalues of the stock correlation matrix and those of the random correlation matrix, we show the analytical result of the random matrices following Ref. [47]. For the correlation matrix of N random time series of length L, the PDF P(l) of the eigenvalues l in the limit N?? and L?? is given by where Q:L=Nw1, and l is within the bounds l min ƒlƒl max . l min and l max are the minimum and maximum eigenvalues of the random correlation matrix, which are given by In Figure 5, we plot P(l) of the random correlation matrix with finite L~400 and N~367, the same as those of the stock return series. Within the bounds ½l min ,l max , P(l) of the correlation matrix constructed from the empirical return series in the first moving window (black solid line) is consistent with the analytical result of equation (3) (red dashed line). There also exist some deviations of large eigenvalues. In particular, the largest eigenvalue l 1 &120 shown in the inset of Figure 5, which is about 31 times larger than l max~3 :83.
We next identify the eigenvalues of the stock correlation matrix which deviate from those of the random correlation matrix, and investigate their variations over time t. The analytical result of RMT is strictly valid for N?? and L??. Instead, we compare l of the stock correlation matrix with l of the correlation matrix constructed from N~367 uncorrelated time series with length L~400. The uncorrelated time series is generated by shuffling the empirical return series, in which the equal-time correlations between the original return series are destroyed. We compute the cross-correlations between these shuffled return series, and use this surrogate correlation matrix as a random correlation matrix. In Figure 6, black circled line denotes the 99th percentile of the eigenvalues calculated from the random correlation matrix. It stays relatively constant about 3 as the time t evolves. This means that 99 percent of the eigenvalues of the random correlation matrix are less than this value.
If an eigenvalue of the empirical correlation matrix is larger than the 99th percentile of the eigenvalues generated from the shuffled return series, it is considered to be significantly larger than the eigenvalues of the random correlation matrix. In Figure 6, the number of the eigenvalues significantly larger than those of the random correlation matrix is plotted by the red square line. The number of empirical l significantly larger than l of random correlation matrix fluctuates over time t.  We give a cursory explanation for the above phenomenon. It can be easily proved that the sum of the eigenvalues of the stock correlation matrix is fixed to be the number of sample stocks, i.e, P N k~1 l k~N . As shown in the distribution of the eigenvalues, the major portion of eigenvalues are distributed in the region lv3, and the large eigenvalues lw20 close to the market crashes of 2001-2003 and 2008-2009 are prominently larger than those during the calm period. Therefore, the number of eigenvalues inbetween 3vlv20 during crashes is less than calm periods. This may indicate that a few of the eigenvalues contain the information about the stock correlations when the market strongly fluctuates.
The commonality among the stock returns can also be detected by the PCA method, which has a close link to the RMT method. In fact, the systemic risk measured by the collective behavior of the stock price movements based on PCA has been analyzed in many studies [6][7][8]. The risk estimation is also valuable for the portfolio optimization, and some work has been done to analyze the riskreturn relationship [48,49]. The PCA method decomposes the returns of a sample of stocks into several orthogonal principal components. The principal components f k are uncorrelated, and satisfy the condition vf k f l w~l k if k~l, where l k is the k-th eigenvalue of the correlation matrix C of stock returns. The standardized return of stock i, defined as z i~½ G i (t){SG i (t)T=s i , can be expressed as a linear combination of the principal where N~367 is the total number of stocks analyzed, and L ik is the component of k-th eigenvector corresponding to stock i, which is also known as the factor loading of f k for stock i. The total variance of the return series is in which the total variance is decomposed into the orthogonal factor loadings L and the eigenvalues l. For the periods that stocks are highly correlated and connectively volatile, a small number nvN of eigenvalues can explain most of the volatility in the system. The cumulative risk fraction (CRF) is generally used to quantify the proportion of total variance explained by the first n principal components [7], also known as absorption ratio in [8]. It is defined as where l k is the k-th eigenvalue, l 1 wl 2 w Á Á Á wl N . Since the PCA is done through the decomposition of the correlation (covariance) matrix of return (standardized return) series, the total variance of the system explained by all N principal components is quantified as P N k~1 l k . The variance associated with the first n principal components is quantified as P n k~1 l k . The CRF is the ratio of these two quantities.
In Figure 7, the CRFs for n~1,10,50,and 367 are shown as a function of the evolving time t. The CRF displays two local maxima at t nearby 02/04/2003 and 04/09/2009, at which it can explain more than 50%, 60%, and 80% of the total variance for n~1,10,and 50 respectively. This indicates that the stocks are highly correlated in the moving windows from 30/07/2001 to 02/ 04/2003 and from 17/01/2008 to 04/09/2009, in which the majority of the stock returns tend to move together. Thus the stock market is at a high level of systemic risk. We also observe that the CRF displays a relatively small value in the moving window from 10/05/2005 to 25/12/2006, in which the stocks are less correlated. These results are coincident with those observed in the mean correlation coefficient.

Evolution of eigenvectors and their interpretations
To analyze the information contained in the deviating eigenvectors, we first investigate the contributions of the eigenvector components grouped in conventional industries. According to the China Securities Regulatory Commission (CSRC) industry code, the stocks traded on Shanghai Stock Exchange are grouped into A-M conventional industries. Table 1 presents summary statistics of the 22 industries, including the industry codes, industry names, and the number of chosen stocks belonging to each industry. For each deviating eigenvector u k , with element u k i as the component of the k-th eigenvector corresponding to stock i, we calculate the contribution of each industry group where N l is the number of stocks belonging to industry group l, l~1, Á Á Á ,22. The measure of X k l is analogous to the analysis of wave function in disordered systems, and firstly introduced to financial data analysis in Ref. [50]. Figure 8 shows X k l for deviating eigenvectors u 1 evolved with time t. The participants of the eigenvectors listed in the horizontal axis are 367 stocks. The stocks belonging to industry group l are endowed with the same value of X k l , and ranked by their capitalizations on the ending date of the sample data. We find that X k l for the largest eigenvector u 1 universally shows large values among different industries, which means that almost all the industries have significant contributions to u 1 . It is quite robust for different t. In Figure 9,  Largest ten components of u 2 , u 3 , u 4 , and u 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 06/09/2000 to 02/04/2003. The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest ten components, the industries they belonging to, and the industry codes are listed. doi:10.1371/journal.pone.0097711.t002 eigenvectors, since there are only small numbers of chosen stocks belonging to these two industries.
To further confirm the wide influence of the largest eigenvector observed in the contributions of industries, we also calculate the projection of the stock returns G j (t) on the largest eigenvector u 1 where u 1 j is the component of u 1 corresponding to stock j, and N is the number of sample stocks. In Figure 10, we plot G 1 (t) against the return of the A-share Index of Shanghai Stock Exchange  figure. The slope is about 0:93+0:06, with a slight quantitative difference for different moving windows. This value is a little bit larger than 0:85 observed in [18]. The significant linear correlation between G 1 (t) and G A{share (t) indicates that the largest eigenvalue can be interpreted as quantifying market-wide influence on all stocks, and it remains quite robust to the variance of t. In fact, all the components of u 1 are positive in our study, and similar results are revealed in [26]. The A-share Index is a capitalization-weighted average of the prices of all A-share stocks, and large components of u 1 are universally distributed among all stocks. Thus it would be no surprise to observe the significant correlation between G 1 (t) and G A{share (t).
We  30/12/2011. The components with the smallest ten average ranks are picked as the largest ten components. The largest ten components correspond to ten stocks which significantly contribute to the relevant eigenvectors. If one looks carefully at the stock codes of the largest ten components, dynamic evolutions of conventional stock industries are remarkably observed. The stocks belonging to the industries which have significant contributions to distinct eigenvectors also appear in their largest ten components. For the moving windows with ending dates in the period from 06/09/2000 to 02/04/2003, as shown in Table 2, among the largest ten components of u 2 five stocks belong to IT industry and one stock belongs to electronics industry, and for u 3 four stocks belong to machinery industry and two stocks belong to petrochemicals industry. In the following period from 02/04/2003 to 25/12/2006, as shown in Table 3, four IT stocks and two electronics stocks are in the list of the largest ten components of u 3 , and five machinery stocks and two petrochemicals stocks are in the list of u 4 . More interestingly, stocks 600198, 600100, and 600770, which are among the largest ten components of u 2 in the first time period, appear in the largest ten components of u 3 in the following period. The starting dates of the moving windows in the first period are from 04/01/1999 to 30/07/2001, and from 30/07/2001 to 10/05/2005 for the second period. The evolutions of the IT and electronic industries recall the history of the Chinese stock market in the period of 1999-2001. During that period of time, the Chinese stock market was in a bull market, and high-tech stocks issued by companies deal in IT and electronics were leading the rise. After 2001, the Chinese stock market started to decline, thus the IT and electronics stocks are contained in u 3 . Similar phenomenon is observed for the stocks in machinery and petrochemicals industries: stocks 600843, 600818, 600618, and 600841 among the largest ten components of u 3 in the first time period become the members of the largest ten components of u 4 in the following period.
The dynamic evolution behavior is also observed in real estate industry. In the period from 02/04/2003 to 25/12/2006, five stocks belonging to real estate industry appear in the largest ten components of u 5 . The number of real estate stocks in the largest ten components of u 4 increases to seven in the period from 25/12/ 2006 to 13/01/2009, as shown in Table 4. In the following period from 13/01/2009 to 11/05/2010, five (seven) real estate stocks are in the largest ten components of u 3 (u 4 ), as shown in Table 5. After September 2008, the Chinese stock market tended to be affected by the global financial crisis, and the stocks belonging to real estate industry were leading the drop. In Table 6, we observe that seven real estate stocks appear in the largest ten components of u 2 for the period from 11/05/2010 to 30/12/2011, in which the moving windows have starting dates from 16/09/2008 to 12/05/2010. In general, the real estate stocks contained in the largest five eigenvectors slowly move to be contained in the second largest eigenvector as the time approaches the global financial crisis. This conclusion is based upon the fact that many real estate stocks appear repeatedly in the largest ten components of the largest five eigenvectors in different periods. For instance, stock 600663 first appears in the largest ten components of u 5 in the period from 02/ 04/2003 to 25/12/2006, then it moves to be in those of u 4 in the

Discussion and Conclusion
In summary, we have conducted a thorough study of the evolution of the cross-correlations between the return series of 367 A-share stocks on Shanghai Stock Exchange from 1999 to 2011. We find that the stock returns behave more collectively in volatile periods, showing biased distribution of correlation coefficients centered around lager positive coefficients and larger values of mean correlation coefficient as the time approaches the two big crashes in 2001 and 2008. In the same volatile periods, we find that the largest eigenvalue shows larger values, while the number of eigenvalues that significantly deviate from those of the random correlation matrix is less. In addition, only a small number of eigenvalues can explain the major portion of the total system variance when the market is volatile, which indicates a high level of systemic risk.
For the interpretation of deviating eigenvectors, we have further analyzed the eigenvector components and their contributions. By computing the contributions of the components grouped in conventional industries, we find significant contributors, such as mining, electronics, IT, and real estate, for distinct eigenvectors over different time t. We also analyze the projection of the stock returns on the largest eigenvector, and confirm the market-wide influence of the largest eigenvector and its stability in time. In the analysis of the component stocks which significantly contribute to each eigenvector, dynamic evolution of conventional industries are observed, basically consistent with the results of industry contributions. The stocks in IT and electronics industries significantly contributing to the second largest eigenvector before the crash in 2001 become the significant contributors of the third largest eigenvector after the crash. Similarly, the stocks in real estate industry significantly contributing to other deviating eigenvectors before the crisis of 2008-2009 become the significant contributors of the second largest eigenvector during the crisis period.
We offer a new interpretation of the deviating eigenvectors of the correlation matrices in the Chinese stock market. It is revealed that the information contained in a particular eigenvector varies over time, which is different from results of fixed sectors and subsectors observed in the static correlation analysis. The dynamic evolution of significant eigenvector contributors reminds us of the sector rotation commonly observed in financial market. This work is valuable for the understanding of risk propagation among interconnected stocks and the classification of stock sectors, and can be further applied to portfolio risk estimation and systemic risk management.