Index tracking strategy based on mixed-frequency financial data

To obtain market average return, investment managers need to construct index tracking portfolio to replicate target index. Currently, most literatures use financial data that has homogenous frequency when constructing the index tracking portfolio. To make up for this limitation, we propose a methodology based on mixed-frequency financial data, called FACTOR-MIDAS-POET model. The proposed model can utilize the intraday return data, daily risk factors data and monthly or quarterly macro economy data, simultaneously. Meanwhile, the out-of-sample analysis demonstrates that our model can improve the tracking accuracy.


Introduction
The index tracking strategy, which aim at tracking the return of a given stock index when constructing the portfolio, is a major strategy adopted by fund managers. [1][2][3][4][5][6][7] theoretically and experimentally study the index tracking strategy under different constraints in reality, e.g. the number of stocks in the portfolio is limited.
The simplest and most widely used index tracking strategy is the global minimum variance strategy. Let R t be the vector of daily excess returns of stocks over the target index and ω t be the global minimum variance portfolio weights. The difference between return of the index tracking strategy and return of the target index is ω T t R t on day t. Mathematically, the investors aim to minimize tracking error, which is measured by variance of the difference between return of the index tracking strategy and return of the target index, i.e., where S t is the N × N conditional covariance matrix of R t . Obviously, the optimal index tracking strategy is We can see that the key part of global minimum variance strategy is to estimate the covariance matrix or inverse covariance matrix. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 In the literature, methods for estimating the covariance matrix or inverse covariance matrix mainly focus on financial data with homogenous frequency. [8][9][10] try to estimate the covariance matrix based on quarterly, monthly and daily returns, respectively. [5,[11][12][13][14][15][16][17][18] aim to improve covariance matrix estimation using intraday data. Differently [19][20][21][22][23], focus on the estimation of inverse covariance matrix. With the improvement in high-speed computation and large amounts of storage, financial data streams become more and more real-time and complex, such as high-frequency data and ultra-high-frequency data. Besides the historical data in financial markets, monthly or quarterly macro-economic factors are also valuable information sources of stocks' volatilities (see [24][25][26][27][28][29] further reveal that macro-economic factors, such as GDP growth, exchange rate and short-term interest rate, are important explanatory variables of the slow-moving component in volatilities. However, due to the heterogeneous frequency of macro-economic factors and historical data, it is a great challenge to construct a unified econometric model. Among all proposed models, the mixed data sampling (MIDAS) method in [30] attracts great attentions, and induces several important extensions, such as GARCH mixed-data sampling (GARCH-MI-DAS) model (see [31]), Factor-based mixed-data sampling (Factor-MIDAS) model (see [32]). Different from the MIDAS method, other economic models handling mix-frequency data, such as High-frequency-based volatility (HEAVY) model (see [33]); Factor GARCH-Itô model (see [34]), are mainly focusing on integrating the intraday financial data and daily financial data. Compared with homogenous-frequency models, mixed-frequency model contains more information in the original data, which can better capture market and have better accuracy in prediction. It provides a timely update on portfolio and helps fund managers achieve targeted index tracking performance.
In this paper, we propose a general framework for mixed-frequency financial data, called FACTOR-MIDAS-POET model, to estimate the covariance matrix. The proposed model combines monthly macro-economic factors, daily observable factors (market return and the innovation of VIX) and intraday returns to improve covariance matrix estimation. In empirical analysis, we compare our model with existing models in the literature, and find that the tracking accuracy of minimum variance tracking strategy is greatly improved by using the proposed model. The reason is that compared with other models, our model contains macro-economic information and option market information. We also find that when the amount of historical data decreases, performances of the index tracking portfolios based on different models all decrease, but our model is less affected. Moreover, when the estimation window is short (e.g., 3 months), integrating intraday return data into our model may yield better tracking performance than not using the intraday return data.
The remaining paper is organized as follows. In the estimations of the covariance and its inverse, we introduce existing methods for estimating the covariance matrix and inverse covariance matrix based on homogenous frequency data. In FACTOR-MIDAS-POET method and estimation, we introduce the FACTOR-MIDAS-POET model and its estimation method. In data and descriptive analysis, we explain the data used in this paper. In empirical study, we conduct the empirical study and compare performances of different models. In conclusions and discussion, we conclude our paper.

The estimations of the covariance and its inverse
To obtain the minimum variance index tracking strategy, there are two main ways. The first one is to estimate the covariance matrix and then obtain the inverse matrix; The second one is to estimate the inverse covariance matrix directly. The rest of this section summarizes the important existing methods for estimating the covariance matrix and inverse covariance matrix.

Estimators of covariance matrix
Based on the daily financial data, we summarize three estimators as follows, S t,1 is the sample covariance matrix. S t,2 is the weighted lead and lag covariance matrix proposed by [35], which is designed to eliminate the non-synchronous trading effect. In this paper, L is set to 3 for daily return following [36]. S t,3 is the backward-looking rolling estimator proposed by [10]. When we have intraday financial data, these estimators are modified as follows, where R i,t−k is the vector of excess returns at time i in day t − k. S t,1 is proposed by [12]. S t,2 is proposed by [5]. If overnight returns are involved in the estimators, we have where R 0,t−k is the overnight return in day t − k.

Estimators of inverse covariance matrix
The estimators of inverse covariance matrix are often built upon the estimators of covariance matrix. We summarize these estimators in this subsection.
where tr(�) is the trace of a matrix, u ¼ NdetðS t;1 Þ 1 N trðS t;1 Þ À 1 , det(�) is the determinant of a matrix, H is an orthogonal matrix and L is a diagonal matrix such that S t,1 = HLH T , C is a diagonal matrix with elements c i = T + N − 2i − 1 for i = 1, . . ., N. S inv t;4 is proposed by [19]. S inv t;5 is the shrinkage estimator proposed by [19]. S inv t;6 is proposed by [20]. S inv t;7 is proposed by [21]. S inv t;8 is proposed by [22], S inv t;9 is proposed by [23].

FACTOR-MIDAS-POET method and estimation
There are two restrictions of the methods summarized in Section 2. The first one is that these classical methods require a homogenous sampling frequency, leading to a low usage of information contained in mixed-frequency financial data; The second is that these classical methods do not take monthly or quarterly macro-economic factors into consideration. To remedy these limitations, we propose a model involving multi-frequency financial data to better reflect financial market. In our model, excess returns are driven by both observable and unobservable factors. Meanwhile, excess returns are influenced by the status of macro economy. More specifically, the model is shown as follows, Bðk; θÞ ¼ where r i,t,m is the vector of excess returns of N stocks at time i in day t and month m, F obs,t,m is the vector of daily observable factors in day t and month m, F unobs,t,m is the vector of daily unobservable factors in day t and month m, � i,t,m is the N-dimensional residual vector, τ m is the long-run component associated with monthly observable macro-economic factors, X k is the vector of the macro-economic factors in month k, B(k, θ) is the widely used Beta weighted lag structure in MIDAS model. We need to mention that τ m is known for any day t in month m, and is updated in the beginning of the next month m + 1. Based on principal orthogonal complement thresholding(POET) method in [37], the daily covariance matrix of excess returns,Σ t;m; , is estimated as follows, �l n are the eigenvalues of the covariance matrix of r i,t,m − a T τ m F obs,t,m ,û j is the eigenvector associated withl j . It is unrealistic to assume that the matrixR J is sparse because of the existence of common factors.
But it is reasonable to assume that the matrixR NÀ J is sparse. To guarantee sparsity, it is natural to set a threshold to shrink the non-diagonal elements ofR NÀ J intõ where s(�) is a generalized shrinkage function, τ is the threshold, I(�) is the indicator function. Then, we obtain the FACTOR-MIDAS-POET estimator of the covariance matrix as follows, Based on [38], we apply the method of linear compression to obtain the shrinkage inverse covariance matrix estimator as follows: where c 1 , c 2 and c 3 are combination coefficients,Σ t;m;FMP is the target matrix. The optimized solution and expression can be found in [38].

Data
In empirical analysis, we use the stocks listed in Dow Jones Industrial Average (DJIA) to track the S&P 500 index. The stock tickers and full company names of 30 stocks listed in DJIA are available in Table 1 3. Inflation (Inflation), which is measured by the Consumer Price Index (CPI) and obtained from CRSP database.
4. Slope of the yield curve (Slope), which is measured by the spread between 10-year treasury rate and 3-month treasury rate and obtained from CRSP database.
5. Default rate (Default), which is measured by the difference between Moody's Baa and Aaa corporate bond yields of the same maturity and obtained from Federal Reserve Board data files in WRDS database.
6. Consumer confidence (CC), which is measured by the Michigan Consumer Sentiment Index and obtained from Trading Economics. Utilizing the method in [28], we conduct the Principle Components Analysis (PCA) to these eight variables and select the first two principle components as macro-economic factors used in our model.

Descriptive analysis
Tables 2 and 3 present the descriptive statistics for daily returns and 5-min returns of 29 DJIA stocks and S&P index, respectively.
Although the mean of daily returns and 5-min returns for S&P 500 index are approximately zero, their standard deviations are significantly different. It indicates that when sampling frequency increase, there is a dramatic change in volatility. Moreover, the skewness switches from -0.1909 for daily return to 0.0440 for 5-min return, which is more right-skewed. The kurtosis increases from 7.4016 for daily return to 22.5620 for 5-min return. The standard deviations of individual stocks obtained from daily returns are significantly larger than those obtained from 5-min returns. Among the 29 individual stocks, 13 stocks have left-skewed distributions for daily return and 18 stocks have left-skewed distributions for 5-min return. Furthermore, the 29 stocks have larger kurtosis for 5-min return. The distributions of individual stock return are unsymmetrical with the long tail and sharp peak. Table 4 further displays the correlation coefficients of eight macro-economic factors. It reveals the strong correlations between short-term interest rate and slope of the yield curve, between exchange rate and default rate, between growth rate in the Industrial Production Index and unemployment rate. Introducing all macroeconomic variables in covariance estimation model at one time will lead to a lack of statistical significance due to the presence of multicollinearity. Thus, we use principal component analysis (PCA) to create new independent variables and estimate inverse covariance matrix based on principle components in empirical study. Table 5 shows the principle component matrix using PCA. The first principal component and second principal component we use explain 96.4945% of the total variance. The first

Empirical study
In our model, excess returns, which could be daily returns or 5-min returns, are driven by two daily observable factors: (i) the value-weighted stock market index; (ii) the innovation of the VIX index. The VIX index is constructed by the implied volatilities of S&P 500 index options. It indicates the investors' expectation for future 30-day volatility of S&P500 index. We also include two monthly observable principle components of the eight macro-economic factors.
We fit the proposed model, estimate the inverse covariance matrix, S inv t;m;FMP , and compute the minimum variance index tracking strategy with a rolling window scheme. And then we apply the derived strategy for the next day.

Comparison of current different estimations
Here, we choose a rolling window of one year (252 trading days). Table 6 compares out-ofsample performances of minimum variance index tracking strategies, which are derived according to different covariance matrix or inverse covariance matrix estimators based on daily return, intraday return with overnight return and intraday return without overnight return, respectivley.
Several conclusions can be obtained from Table 6. First, tracking error achieved by S t,1 with daily return is 0.043973, which is smaller than the tracking errors of using intraday return. Second, the shrinkage inverse covariance matrix estimator, S inv t;5 , has the best out-of-sample performance (i.e., smallest tracking error). Third, when including intraday returns, introducing overnight returns often yields better tracking performance, which implies that the overnight returns are useful. Fourth, compared with covariance matrix estimators, most inverse covariance matrix estimators do not have significantly better performances. Furthermore, Table 6 provides turnover rates of applying different strategies. The turnover rates of strategies based on intraday return are smaller, which indicates the value of intraday returns. The tracking strategy with the estimator S inv t;7 has the lowest turnover rate. Surprisingly, involving intraday returns may lower the tracking performance but improve the turnover rates for these estimators. What's more, overnight returns are meaningful to improve the tracking performance.

Performance of FACTOR-MIDAS-POET method
We consider two models. The first one is based on daily returns of stocks, two daily observable factors, and two monthly macro-economic principle components. The second one is based on 5-min returns of stocks, two daily observable factors, and two monthly macro-economic principle components. Similar to [30], we estimate the models with slow weights (θ 1 = 1 and θ 2 = 4). We also try other forms of weights and obtain the similar conclusions. Table 7 presents out-of-sample tracking performances of strategies based on FACTOR-MI-DAS-POET model. The lag period column reports how many monthly macro-economic principle components are included in the models. Panel A presents the results based on daily returns, while Panel B and Panel C present the results based on 5-min returns. We report the one-tailed t test of tracking errors of different portfolios based on Newey-West standard deviations with six lags in Tables 6 and 7. Following [39], we examine the average squared excess returns over the target index of various estimates with the average squared excess returns of FACTOR-MIDAS-POET method (K = 3). To test the average of S 2 1;FÀ MÀ P , S 2 2;FÀ MÀ P , . . ., S 2 N;FÀ MÀ P is significantly smaller than the average of S 2 1;others , S 2 2;others , . . ., S 2 N;others , we test whether the mean of the sequence logðS 2 1;FÀ MÀ P =S 2 1;others Þ, logðS 2 2;FÀ MÀ P =S 2 2;others Þ, . . ., logðS 2 N;FÀ MÀ P =S 2 N;others Þ is significantly smaller than zero using a one-tailed test.
Some conclusions can be obtained from Table 7. First, the tracking performance and turnover ratio based on daily returns are both better than those based on 5-min returns. Second, the tracking performance of our model depends on how many monthly macro-economic principle components are included. Third, except S t,3 , S inv t;5 and S inv t;7 , the tracking performance and turnover ratio of proposed model are both better than other covariance estimation models based on daily returns. Meanwhile, the one-tail t value of S t,3 , S inv t;5 and S inv t;7 are not great. Similarly, compared with most models reported in Table 6, the proposed FACTOR-MIDAS-POET model has better out-of-sample tracking performance using intraday return. Because our method utilizes financial data with different resources and frequencies. However, the turnover rate of our model is high. This indicates that investment strategies constructed by our method are more active. The high turnover rate is not always a negative indicator. When trying to minimize the index tracking errors for investors, the index investment strategy constructed by our model can reach the targeted performance.

Robust analysis
In this robust analysis, we change the rolling window into three months (3 m), six months (6 m) and nine months (9 m), in order to check whether the length of rolling windows affects our main results. Tables 8-10 summarize out-of-sample performances of strategies based on different models under three different lengths of rolling windows. When the length of estimation window decreases, tracking errors and turnover rates of all models increase. Because less information is used. Meanwhile, as the length of estimation window decreases, the value of 5-min return data becomes more and more important. Therefore, the intraday information can help us estimate covariance matrix or inverse covariance matrix and construct investment strategies when there is a lack of other information. Most importantly, the proposed FACTOR-MIDAD-POET model has better tracking performance than other models in general, especially when the rolling window is three months. Thus, when the length of estimation window is relatively short, introducing macro-economic information and option market information can greatly improve index tracking strategy.

Conclusions and discussion
In this paper, we propose the FACTOR-MIDAS-POET model, which integrates the intraday return data, daily risk factors data and monthly or quarterly macro economy data, simultaneously. In empirical analysis, we show that the FACTOR-MIDAS-POET model with macro- economic factors has better out-of-sample tracking performance than most current used models in the literature. The proposed model can fully utilize the financial data with different resources and frequencies. However, the better tracking performance often accompanies with higher turnover rates. Meanwhile, we find that our model has better performance as the length of the estimation window decreases. Our work is a preliminarily study on the index tracking with mix-frequency data. There are still a lot of aspects we do not cover. For instance, we do not consider the noise of intraday data in the model. Also, when studying index tracking problems, the transaction cost is not explicitly appeared in the model. These problems are worth to be further explored in future studies.  Supporting information S1 Dataset. (XLSX)