Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeTime series data and correlation.
Posted by nirajp6 on 03 Mar 2016 at 07:19 GMT
We have to remember that time is not a stochastic variable but a deterministic index. So we cannot talk about correlation between time (t) and other stochastic variables Y(t), which moves across time (t). Time (t) is included in the model to capture time heterogeneity rather than to explain anything about Y. If two time series variables X(t) and Y(t) are substantially correlated they should be correlated even after we control for time (t). Otherwise, we have no way to distinguish whether the correlation is a true correlation or just some spuriousness.
For example if X(t) = a + b*t and Y= c + d*t, then running a regression Y = m + n*t + h*X should show that h is insignificant. If we ignore t and just run regression: Y = m + h*X, h will always appear as significant (which is what the original paper is doing).
But if Y = c + d*t +e*X, and we run regression: Y = m + n*t + h*X then h should come up as significant in spite of the presence of t. If h is significant with the presence of t in the model, it enhances the evidence for the effect of X on Y.
The substantive plausibility of the relationship cannot be used as evidence for or against a theory (or the relationship itself). The theory has to be validated independently using observed data. Even if one does not know what X and Y represent substantively (theoretically), one should be able to show the presence/absence of correlation.