Figures
Abstract
To obtain market average return, investment managers need to construct index tracking portfolio to replicate target index. Currently, most literatures use financial data that has homogenous frequency when constructing the index tracking portfolio. To make up for this limitation, we propose a methodology based on mixed-frequency financial data, called FACTOR-MIDAS-POET model. The proposed model can utilize the intraday return data, daily risk factors data and monthly or quarterly macro economy data, simultaneously. Meanwhile, the out-of-sample analysis demonstrates that our model can improve the tracking accuracy.
Citation: Cui X, Zhang X (2021) Index tracking strategy based on mixed-frequency financial data. PLoS ONE 16(4): e0249665. https://doi.org/10.1371/journal.pone.0249665
Editor: Stefan Cristian Gherghina, The Bucharest University of Economic Studies, ROMANIA
Received: September 27, 2020; Accepted: March 23, 2021; Published: April 6, 2021
Copyright: © 2021 Cui, Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: The first author (Xiangyu Cui) is supported in part by National Natural Science Foundation of China under grant 71671106. The second author (Xuan Zhang) is supported in part by Innovation Project Research Fund of SHUFE (No. CXJJ-2020-411). There was no additional external funding received for this study.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The index tracking strategy, which aim at tracking the return of a given stock index when constructing the portfolio, is a major strategy adopted by fund managers. [1–7] theoretically and experimentally study the index tracking strategy under different constraints in reality, e.g. the number of stocks in the portfolio is limited.
The simplest and most widely used index tracking strategy is the global minimum variance strategy. Let Rt be the vector of daily excess returns of stocks over the target index and ωt be the global minimum variance portfolio weights. The difference between return of the index tracking strategy and return of the target index is on day t. Mathematically, the investors aim to minimize tracking error, which is measured by variance of the difference between return of the index tracking strategy and return of the target index, i.e.,
where Σt is the N × N conditional covariance matrix of Rt. Obviously, the optimal index tracking strategy is
We can see that the key part of global minimum variance strategy is to estimate the covariance matrix or inverse covariance matrix.
In the literature, methods for estimating the covariance matrix or inverse covariance matrix mainly focus on financial data with homogenous frequency. [8–10] try to estimate the covariance matrix based on quarterly, monthly and daily returns, respectively. [5, 11–18] aim to improve covariance matrix estimation using intraday data. Differently [19–23], focus on the estimation of inverse covariance matrix. With the improvement in high-speed computation and large amounts of storage, financial data streams become more and more real-time and complex, such as high-frequency data and ultra-high-frequency data. Besides the historical data in financial markets, monthly or quarterly macro-economic factors are also valuable information sources of stocks’ volatilities (see [24–29] further reveal that macro-economic factors, such as GDP growth, exchange rate and short-term interest rate, are important explanatory variables of the slow-moving component in volatilities.
However, due to the heterogeneous frequency of macro-economic factors and historical data, it is a great challenge to construct a unified econometric model. Among all proposed models, the mixed data sampling (MIDAS) method in [30] attracts great attentions, and induces several important extensions, such as GARCH mixed-data sampling (GARCH-MIDAS) model (see [31]), Factor-based mixed-data sampling (Factor-MIDAS) model (see [32]). Different from the MIDAS method, other economic models handling mix-frequency data, such as High-frequency-based volatility (HEAVY) model (see [33]); Factor GARCH-Itô model (see [34]), are mainly focusing on integrating the intraday financial data and daily financial data. Compared with homogenous-frequency models, mixed-frequency model contains more information in the original data, which can better capture market and have better accuracy in prediction. It provides a timely update on portfolio and helps fund managers achieve targeted index tracking performance.
In this paper, we propose a general framework for mixed-frequency financial data, called FACTOR-MIDAS-POET model, to estimate the covariance matrix. The proposed model combines monthly macro-economic factors, daily observable factors (market return and the innovation of VIX) and intraday returns to improve covariance matrix estimation. In empirical analysis, we compare our model with existing models in the literature, and find that the tracking accuracy of minimum variance tracking strategy is greatly improved by using the proposed model. The reason is that compared with other models, our model contains macro-economic information and option market information. We also find that when the amount of historical data decreases, performances of the index tracking portfolios based on different models all decrease, but our model is less affected. Moreover, when the estimation window is short (e.g., 3 months), integrating intraday return data into our model may yield better tracking performance than not using the intraday return data.
The remaining paper is organized as follows. In the estimations of the covariance and its inverse, we introduce existing methods for estimating the covariance matrix and inverse covariance matrix based on homogenous frequency data. In FACTOR-MIDAS-POET method and estimation, we introduce the FACTOR-MIDAS-POET model and its estimation method. In data and descriptive analysis, we explain the data used in this paper. In empirical study, we conduct the empirical study and compare performances of different models. In conclusions and discussion, we conclude our paper.
The estimations of the covariance and its inverse
To obtain the minimum variance index tracking strategy, there are two main ways. The first one is to estimate the covariance matrix and then obtain the inverse matrix; The second one is to estimate the inverse covariance matrix directly. The rest of this section summarizes the important existing methods for estimating the covariance matrix and inverse covariance matrix.
Estimators of covariance matrix
Based on the daily financial data, we summarize three estimators as follows,
(1)
(2)
(3) St,1 is the sample covariance matrix. St,2 is the weighted lead and lag covariance matrix proposed by [35], which is designed to eliminate the non-synchronous trading effect. In this paper, L is set to 3 for daily return following [36]. St,3 is the backward-looking rolling estimator proposed by [10].
When we have intraday financial data, these estimators are modified as follows,
(4)
(5)
(6)
where Ri,t−k is the vector of excess returns at time i in day t − k. St,1 is proposed by [12]. St,2 is proposed by [5]. If overnight returns are involved in the estimators, we have
(7)
(8)
where R0,t−k is the overnight return in day t − k.
Estimators of inverse covariance matrix
The estimators of inverse covariance matrix are often built upon the estimators of covariance matrix. We summarize these estimators in this subsection.
(9)
(10)
(11)
(12)
(13)
(14)
where tr(⋅) is the trace of a matrix,
, det(⋅) is the determinant of a matrix, H is an orthogonal matrix and L is a diagonal matrix such that St,1 = HLHT, C is a diagonal matrix with elements ci = T + N − 2i − 1 for i = 1, …, N.
is proposed by [19].
is the shrinkage estimator proposed by [19].
is proposed by [20].
is proposed by [21].
is proposed by [22],
is proposed by [23].
FACTOR-MIDAS-POET method and estimation
There are two restrictions of the methods summarized in Section 2. The first one is that these classical methods require a homogenous sampling frequency, leading to a low usage of information contained in mixed-frequency financial data; The second is that these classical methods do not take monthly or quarterly macro-economic factors into consideration. To remedy these limitations, we propose a model involving multi-frequency financial data to better reflect financial market.
In our model, excess returns are driven by both observable and unobservable factors. Meanwhile, excess returns are influenced by the status of macro economy. More specifically, the model is shown as follows,
(15)
(16)
(17)
where ri,t,m is the vector of excess returns of N stocks at time i in day t and month m, Fobs,t,m is the vector of daily observable factors in day t and month m, Funobs,t,m is the vector of daily unobservable factors in day t and month m, ϵi,t,m is the N-dimensional residual vector, τm is the long-run component associated with monthly observable macro-economic factors, Xk is the vector of the macro-economic factors in month k, B(k, θ) is the widely used Beta weighted lag structure in MIDAS model. We need to mention that τm is known for any day t in month m, and is updated in the beginning of the next month m + 1.
Based on principal orthogonal complement thresholding(POET) method in [37], the daily covariance matrix of excess returns, , is estimated as follows,
(18)
where
,
,
are the eigenvalues of the covariance matrix of ri,t,m − aT τm Fobs,t,m,
is the eigenvector associated with
. It is unrealistic to assume that the matrix
is sparse because of the existence of common factors. But it is reasonable to assume that the matrix
is sparse. To guarantee sparsity, it is natural to set a threshold to shrink the non-diagonal elements of
into
(19)
where s(⋅) is a generalized shrinkage function, τ is the threshold, I(⋅) is the indicator function. Then, we obtain the FACTOR-MIDAS-POET estimator of the covariance matrix as follows,
(20)
Based on [38], we apply the method of linear compression to obtain the shrinkage inverse covariance matrix estimator as follows:
(21)
where c1, c2 and c3 are combination coefficients,
is the target matrix. The optimized solution and expression can be found in [38].
Data and descriptive analysis
Data
In empirical analysis, we use the stocks listed in Dow Jones Industrial Average (DJIA) to track the S&P 500 index. The stock tickers and full company names of 30 stocks listed in DJIA are available in Table 1. As there are too many missing values in the TAQ data files of TRV company, we remove this company and use the rest 29 DJIA stocks to track S&P 500 index. The sample period in our analysis is from Jan. 1st, 2006 to Dec. 31st, 2011. The daily and 5 minutes (5-min) data of S&P 500 index are obtained from Tick Data. The 5-min data of 30 DJIA stocks are collected from NYSE Trade and Quotations (TAQ) database. Considering the higher possibility of including biases and reporting errors in the first 30 minutes after opening, we discard the first 30 minutes data. Thus, there are 72 intraday 5-min returns and one overnight return in each trading day.
The daily value-weighted market return and daily VIX are considered as daily observable factors and obtained from CRSP database and Chicago Board Options Exchange (CBOE), respectively.
The monthly macro-economic factors includes eight important ones:
- Short-term interest rate (Interest), which is measured by 3-month US treasury bill rate and obtained from the Federal Reserve Board’s H.10.
- Exchange rate (Exch.), which is measured by the major currencies index collected from Federal Reserve Banks and obtained from the Federal Reserve Board’s H.15.
- Inflation (Inflation), which is measured by the Consumer Price Index (CPI) and obtained from CRSP database.
- Slope of the yield curve (Slope), which is measured by the spread between 10-year treasury rate and 3-month treasury rate and obtained from CRSP database.
- Default rate (Default), which is measured by the difference between Moody’s Baa and Aaa corporate bond yields of the same maturity and obtained from Federal Reserve Board data files in WRDS database.
- Consumer confidence (CC), which is measured by the Michigan Consumer Sentiment Index and obtained from Trading Economics.
- Growth rate in the Industrial Production Index (IPI), which is obtained from Federal Reserve Board data files in WRDS database.
- Unemployment rate (Unempl.), which is obtained from Trading Economics.
Utilizing the method in [28], we conduct the Principle Components Analysis (PCA) to these eight variables and select the first two principle components as macro-economic factors used in our model.
Descriptive analysis
Tables 2 and 3 present the descriptive statistics for daily returns and 5-min returns of 29 DJIA stocks and S&P index, respectively.
Although the mean of daily returns and 5-min returns for S&P 500 index are approximately zero, their standard deviations are significantly different. It indicates that when sampling frequency increase, there is a dramatic change in volatility. Moreover, the skewness switches from -0.1909 for daily return to 0.0440 for 5-min return, which is more right-skewed. The kurtosis increases from 7.4016 for daily return to 22.5620 for 5-min return.
The standard deviations of individual stocks obtained from daily returns are significantly larger than those obtained from 5-min returns. Among the 29 individual stocks, 13 stocks have left-skewed distributions for daily return and 18 stocks have left-skewed distributions for 5-min return. Furthermore, the 29 stocks have larger kurtosis for 5-min return. The distributions of individual stock return are unsymmetrical with the long tail and sharp peak.
Table 4 further displays the correlation coefficients of eight macro-economic factors. It reveals the strong correlations between short-term interest rate and slope of the yield curve, between exchange rate and default rate, between growth rate in the Industrial Production Index and unemployment rate. Introducing all macroeconomic variables in covariance estimation model at one time will lead to a lack of statistical significance due to the presence of multicollinearity. Thus, we use principal component analysis (PCA) to create new independent variables and estimate inverse covariance matrix based on principle components in empirical study.
Table 5 shows the principle component matrix using PCA. The first principal component and second principal component we use explain 96.4945% of the total variance. The first component has significantly correlations with short-term interest rate and slope of the yield curve, which is called monetary factor. It is viewed as an indicator of monetary policy and bond market. The second component named as economic factor has positive correlations with default rate and exchange rate, which mostly captures the periodic fluctuations in economic activity.
Empirical study
In our model, excess returns, which could be daily returns or 5-min returns, are driven by two daily observable factors: (i) the value-weighted stock market index; (ii) the innovation of the VIX index. The VIX index is constructed by the implied volatilities of S&P 500 index options. It indicates the investors’ expectation for future 30-day volatility of S&P500 index. We also include two monthly observable principle components of the eight macro-economic factors. We fit the proposed model, estimate the inverse covariance matrix, , and compute the minimum variance index tracking strategy with a rolling window scheme. And then we apply the derived strategy for the next day.
Comparison of current different estimations
Here, we choose a rolling window of one year (252 trading days). Table 6 compares out-of-sample performances of minimum variance index tracking strategies, which are derived according to different covariance matrix or inverse covariance matrix estimators based on daily return, intraday return with overnight return and intraday return without overnight return, respectivley.
Several conclusions can be obtained from Table 6. First, tracking error achieved by St,1 with daily return is 0.043973, which is smaller than the tracking errors of using intraday return. Second, the shrinkage inverse covariance matrix estimator, , has the best out-of-sample performance (i.e., smallest tracking error). Third, when including intraday returns, introducing overnight returns often yields better tracking performance, which implies that the overnight returns are useful. Fourth, compared with covariance matrix estimators, most inverse covariance matrix estimators do not have significantly better performances. Furthermore, Table 6 provides turnover rates of applying different strategies. The turnover rates of strategies based on intraday return are smaller, which indicates the value of intraday returns. The tracking strategy with the estimator
has the lowest turnover rate.
Surprisingly, involving intraday returns may lower the tracking performance but improve the turnover rates for these estimators. What’s more, overnight returns are meaningful to improve the tracking performance.
Performance of FACTOR-MIDAS-POET method
We consider two models. The first one is based on daily returns of stocks, two daily observable factors, and two monthly macro-economic principle components. The second one is based on 5-min returns of stocks, two daily observable factors, and two monthly macro-economic principle components. Similar to [30], we estimate the models with slow weights (θ1 = 1 and θ2 = 4). We also try other forms of weights and obtain the similar conclusions.
Table 7 presents out-of-sample tracking performances of strategies based on FACTOR-MIDAS-POET model. The lag period column reports how many monthly macro-economic principle components are included in the models. Panel A presents the results based on daily returns, while Panel B and Panel C present the results based on 5-min returns. We report the one-tailed t test of tracking errors of different portfolios based on Newey-West standard deviations with six lags in Tables 6 and 7. Following [39], we examine the average squared excess returns over the target index of various estimates with the average squared excess returns of FACTOR-MIDAS-POET method (K = 3). To test the average of ,
, …,
is significantly smaller than the average of
,
, …,
, we test whether the mean of the sequence
,
, …,
is significantly smaller than zero using a one-tailed test.
Some conclusions can be obtained from Table 7. First, the tracking performance and turnover ratio based on daily returns are both better than those based on 5-min returns. Second, the tracking performance of our model depends on how many monthly macro-economic principle components are included. Third, except St,3, and
, the tracking performance and turnover ratio of proposed model are both better than other covariance estimation models based on daily returns. Meanwhile, the one-tail t value of St,3,
and
are not great. Similarly, compared with most models reported in Table 6, the proposed FACTOR-MIDAS-POET model has better out-of-sample tracking performance using intraday return. Because our method utilizes financial data with different resources and frequencies. However, the turnover rate of our model is high. This indicates that investment strategies constructed by our method are more active. The high turnover rate is not always a negative indicator. When trying to minimize the index tracking errors for investors, the index investment strategy constructed by our model can reach the targeted performance.
Robust analysis
In this robust analysis, we change the rolling window into three months (3 m), six months (6 m) and nine months (9 m), in order to check whether the length of rolling windows affects our main results.
Tables 8–10 summarize out-of-sample performances of strategies based on different models under three different lengths of rolling windows. When the length of estimation window decreases, tracking errors and turnover rates of all models increase. Because less information is used. Meanwhile, as the length of estimation window decreases, the value of 5-min return data becomes more and more important. Therefore, the intraday information can help us estimate covariance matrix or inverse covariance matrix and construct investment strategies when there is a lack of other information. Most importantly, the proposed FACTOR-MIDAD-POET model has better tracking performance than other models in general, especially when the rolling window is three months. Thus, when the length of estimation window is relatively short, introducing macro-economic information and option market information can greatly improve index tracking strategy.
Conclusions and discussion
In this paper, we propose the FACTOR-MIDAS-POET model, which integrates the intraday return data, daily risk factors data and monthly or quarterly macro economy data, simultaneously. In empirical analysis, we show that the FACTOR-MIDAS-POET model with macro-economic factors has better out-of-sample tracking performance than most current used models in the literature. The proposed model can fully utilize the financial data with different resources and frequencies. However, the better tracking performance often accompanies with higher turnover rates. Meanwhile, we find that our model has better performance as the length of the estimation window decreases.
Our work is a preliminarily study on the index tracking with mix-frequency data. There are still a lot of aspects we do not cover. For instance, we do not consider the noise of intraday data in the model. Also, when studying index tracking problems, the transaction cost is not explicitly appeared in the model. These problems are worth to be further explored in future studies.
References
- 1. Rudolf M, Wolter H, and Zimmermann H. A linear model for tracking error minimization. Journal of Banking and Finance, 23(1):85–103, 1999.
- 2. Jansen R and Dijk RV. Optimal benchmark tracking with small portfolios. The journal of portfolio management, 28(2):33–39, 2002.
- 3. Coleman TF, Li Y, and Henniger J. Minimizing tracking error while restricting the number of assets. Journal of Risk, 8(4):33, 2006.
- 4. Corielli F and Marcellino M. Factor based index tracking. Journal of Banking & Finance, 30(8):2215–2233, 2006.
- 5. Liu Q. On portfolio optimization: How and when do we benefit from high-frequency data? Journal of Applied Econometrics, 24(4):560–582, 2009.
- 6. Canakgoz NA and Beasley JE. Mixed-integer programming approaches for index tracking and enhanced indexation. European Journal of Operational Research, 196(1):384–399, 2009.
- 7. Guastaroba G and Speranza MG. Kernel search: An application to the index tracking problem. European Journal of Operational Research, 217(1):54–68, 2012.
- 8. Bollerslev T, Engle RF, and Wooldridge JM. A capital asset pricing model with time-varying covariances. Journal of Political Economy, 96(1):116–131, 1988.
- 9. Ledoit O and Wolf M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of empirical finance, 10(5):603–621, 2003.
- 10. Fleming J, Kirby C, and Ostdiek B. The economic value of volatility timing using realized volatility. Journal of Financial Economics, 67(3):473–509, 2003.
- 11. Merton RC. On estimating the expected return on the market: An exploratory investigation. Journal of financial economics, 8(4):323–361, 1980.
- 12. Andersen TG and Bollerslev T. Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review, 39(4):885–905, 1998.
- 13. Andersen TG, Bollerslev T, Diebold FX, and Ebens H. The distribution of realized stock return volatility. Journal of financial economics, 61(1):43–76, 2001.
- 14. Andersen TG, Bollerslev T, Diebold FX, and Labys P. Modeling and forecasting realized volatility. Econometrica, 71(2):579–625, 2003.
- 15. Bandi FM, Russell JR, and Zhu Y. Using high-frequency data in dynamic portfolio choice. Econometric Reviews, 27(1-3):163–198, 2008.
- 16. Pooter MD, Martens M, and Dijk DV. Predicting the daily covariance matrix for s&p 100 stocks using intraday data but which frequency to use? Econometric Reviews, 27(1-3):199–229, 2008.
- 17. Chiriac R and Voev V. Modelling and forecasting multivariate realized volatility. Journal of Applied Econometrics, 26(6):922–947, 2011.
- 18. Hautsch N, Kyj LM, and Malec P. Do high-frequency data improve high-dimensional portfolio allocations? Journal of Applied Econometrics, 30(2):263–290, 2015.
- 19. Efron B, Morris C. Multivariate empirical bayes and estimation of covariance matrices. The Annals of Statistics, 4(1):22–32, 1976.
- 20. Haff LR. Minimax estimators for a multinormal precision matrix. Journal of Multivariate Analysis, 7(3):374–385, 1977.
- 21. Haff LR. Estimation of the inverse covariance matrix: Random mixtures of the inverse wishart matrix and the identity. Annals of Statistics, 7(6):1264–1276, 1979.
- 22. Dey DK, Ghosh M, and Srinivasan C. A new class of improved estimators of a multinormal precision matrix. Statistics & Risk Modeling, 8(2):141–152, 1990.
- 23. Kubokawa T. A revisit to estimation of the precision matrix of the wishart distribution. Journal of Statistical Research, 39:91–114, 2005.
- 24. Flannery MJ and Protopapadakis AA. Macroeconomic factors do influence aggregate stock returns. The review of financial studies, 15(3):751–782, 2002.
- 25. Chang KL. Do macroeconomic variables have regime-dependent effects on stock return dynamics? evidence from the markov regime switching model. Economic Modelling, 26(6):1283–1299, 2009.
- 26. Baele L, Bekaert G, and Inghelbrecht K. The determinants of stock and bond return comovements. The Review of Financial Studies, 23(6):2374–2428, 2010.
- 27. Engle RF, Ghysels E, and Sohn B. Stock market volatility and macroeconomic fundamentals. Review of Economics and Statistics, 95(3):776–797, 2013.
- 28. Asgharian H, Hou AJ, and Javed F. The importance of the macroeconomic variables in forecasting stock return variance: A garch-midas approach. Journal of Forecasting, 32(7):600–612, 2013.
- 29. Asgharian H, Christiansen C, and Hou AJ. Macro-finance determinants of the long-run stock–bond correlation: The dcc-midas specification. Journal of Financial Econometrics, 14(3):617–642, 2015.
- 30. Ghysels E, Sinko A, and Valkanov R. Midas regressions: Further results and new directions. Econometric Reviews, 26(1):53–90, 2007.
- 31.
Engle RF, Ghysels E, and Sohn B. On the economic sources of stock market volatility. AFA 2008 New Orleans Meetings Paper, 2008.
- 32. Marcellino M and Schumacher C. Factor midas for nowcasting and forecasting with ragged-edge data: A model comparison for german gdp. Oxford Bulletin of Economics and Statistics, 72(4):518–550, 2010.
- 33. Shephard N and Sheppard K. Realising the future: Forecasting with high-frequency-based volatility (heavy) models. Journal of Applied Econometrics, 25(2):197–231, 2010.
- 34. Kim D and Fan J. Factor garch-ito models for high-frequency data with application to large volatility matrix prediction. Journal of Econometrics, 208(2):395–417, 2019.
- 35. Newey WK and West KD. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55:703–708, 1987.
- 36. Cohen KJ, Hawawini GA, Maier SF, Schwartz RA, and Whitcomb DK. Friction in the trading process and the estimation of systematic risk. Journal of Financial Economics, 12(2):263–278, 1983.
- 37. Fan J, Liao Y, and Mincheva M. Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4):603–680, 2013. pmid:24348088
- 38. Kourtis A, Dotsis G, and Markellos RN. Parameter uncertainty in portfolio selection: Shrinking the inverse covariance matrix. Journal of Banking & Finance, 36(9):2522–2531, 2012.
- 39.
Basak GK, Ma T, and Jagannathan R. Assessing the risk in sample minimum risk portfolios. Working paper, Northwestern University, 2004.