Reader Comments

Post a new comment on this article

Time series data and correlation.

Posted by nirajp6 on 03 Mar 2016 at 07:19 GMT

We have to remember that time is not a stochastic variable but a deterministic index. So we cannot talk about correlation between time (t) and other stochastic variables Y(t), which moves across time (t). Time (t) is included in the model to capture time heterogeneity rather than to explain anything about Y. If two time series variables X(t) and Y(t) are substantially correlated they should be correlated even after we control for time (t). Otherwise, we have no way to distinguish whether the correlation is a true correlation or just some spuriousness.
For example if X(t) = a + b*t and Y= c + d*t, then running a regression Y = m + n*t + h*X should show that h is insignificant. If we ignore t and just run regression: Y = m + h*X, h will always appear as significant (which is what the original paper is doing).
But if Y = c + d*t +e*X, and we run regression: Y = m + n*t + h*X then h should come up as significant in spite of the presence of t. If h is significant with the presence of t in the model, it enhances the evidence for the effect of X on Y.
The substantive plausibility of the relationship cannot be used as evidence for or against a theory (or the relationship itself). The theory has to be validated independently using observed data. Even if one does not know what X and Y represent substantively (theoretically), one should be able to show the presence/absence of correlation.

No competing interests declared.