Figure 1.
Analytic strategy of the study.
First, a causality algorithm is applied to infer whether the variable Y is a weighted sum of the variable X and a residual term e (X causes Y), or vice versa. Second, assumptions of the applied causal model are evaluated. Third, a simulation study probes the model’s sensitivity for assumption violations that are difficult to evaluate directly; most importantly, the impact of the partial confounding on the algorithms ability to recognize causal association is evaluated.
Table 1.
Sample Characteristics and Attrition in the Young Finns Study.
Table 2.
Sample Characteristics and Attrition in the Wisconsin Longitudinal Study.
Table 3.
Pairwise Causality Comparisons for 2000 Bootstrap Re-samples in Young Finns Data.
Table 4.
Pairwise Causality Comparisons for 2000 Bootstrap Re-samples in the Data from Wisconsin Longitudinal Study.
Figure 2.
Three linear (Ordinary Least Squares) regression models corresponding to causal directions estimated by DirectLiNGAM-algorithm.
Each row shows data for a model estimated in one data set. First panel of a row (A, D, or G) shows the linear (thick line) and quadratic (thin line) fits, superimposed on the data points. Jitter (a uniform random variable ranging from −0.1 to 0.1) was added to variables to enhance visibility of data points. Second panel is a scatterplot of the linear model residual against the independent variable. Last panel of each row shows a Gaussian probability density with mean and standard deviation equaling those of the observed residual distribution, and a kernel density estimate of the observed linear model residual.
Table 5.
P-values for Statistical tests evaluating LiNGAM assumptions.
Figure 3.
Simulation study approximating the observed data.
The situation where mBDI was linearly modeled in the Young Finns data using Sleep problems as independent/predicting variable was modeled. Histograms of Sleep problems (A), Ordinary Least Squares residual of mBDI (B), and the dependent mBDI (C) are shown, together with probability distributions fitted to these data (thick lines, y-axis re-scaled for the number of observations), and (Gaussian-) kernel density estimates of the data (thin lines). First panel suggests that Mixture of Gaussians is not a good model for Sleep problems; a shifted Exponential distribution was chosen.
Figure 4.
Simulation results by gradually perturbing the model of Figure 3.
The rows signify the applied causality statistic: DirectLiNGAM-based (panels A,B,C), Skew-based (D,E,F), and Tanh-based statistic (G,H,I). Two leftmost panels of each row show estimation success (proportion of correct estimates) as a function of the degree of latent confounding. Different types of confounding (linear or proportional) and different distributional conditions were tested: Gaussian mixture (GM), Exponential (Exp), and GM and Exp (different) residual, and with all GM distributions; see methods. Last panel shows estimation success when an amount of Gaussian ‘measurement error’ indicated by horizontal axis was added to independent variable.
Figure 5.
Total Test Information for the items of BDI-II (solid line) and for those of mBDI (dashed line).
Units of the horizontal axis represent standard deviations of the latent/general depression as estimated by unidimensional Graded Response Model. Information per latent depression value holds no absolute meaning; it is estimated by integral over an adjacent step in 200 point discretization of horizontal axis. In addition to (Fisher) Information-content of the scales, the thin dotted line plots a Gaussian kernel density estimate from the factor scores of the estimated Graded Response Model, normalized to maximum of one; this serves to illustrate which severity-levels were actually present in the population-based Young Finns data set.