The authors have declared that no competing interests exist.
Conceived and designed the experiments: TR LKJ. Analyzed the data: TR. Contributed reagents/materials/analysis tools: TR MJ SP MH LPR JSV OTR LKJ. Wrote the paper: TR MJ SP MH LPR JSV OTR LKJ.
Depressive mood is often preceded by sleep problems, suggesting that they increase the risk of depression. Sleep problems can also reflect prodromal symptom of depression, thus temporal precedence alone is insufficient to confirm causality. The authors applied recently introduced statistical causaldiscovery algorithms that can estimate causality from crosssectional samples in order to infer the direction of causality between the two sets of symptoms from a novel perspective. Two commonpopulation samples were used; one from the Young Finns study (690 men and 997 women, average age 37.7 years, range 30–45), and another from the Wisconsin Longitudinal study (3101 men and 3539 women, average age 53.1 years, range 52–55). These included three depression questionnaires (two in Young Finns data) and two sleep problem questionnaires. Three different causality estimates were constructed for each data set, tested in a benchmark data with a (practically) known causality, and tested for assumption violations using simulated data. Causality algorithms performed well in the benchmark data and simulations, and a prediction was drawn for future empirical studies to confirm: for minor depression/dysphoria, sleep problems cause significantly more dysphoria than dysphoria causes sleep problems. The situation may change as depression becomes more severe, or more severe levels of symptoms are evaluated; also, artefacts due to severe depression being less well presented in the population data than minor depression may intervene the estimation for depression scales that emphasize severe symptoms. The findings are consistent with other emerging epidemiological and biological evidence.
Statistical measures of causality have been introduced for crosssectional data
Sleep problems have rapidly climbed among the leading health problems in western societies. Point prevalence estimates of insomnia vary between 6% and 48%, depending on the definition and sample/country
Complaints of poor sleep quality are estimated to occur in 50% to 90% of diagnosed cases of depression
Sleep problems often precede the onset of melancholic/depressed mood
These considerations make evident that an efficient crosssectional measure of causality would be useful in determining whether sleep problems cause depression or depression causes sleep problems. Recent work in computational statistics has shown that the use of information in the higherorder (nonGaussian
The important assumption of nonGaussian distribution should logically hold for population distributions of a depression and sleep problems scores, as both the variables should be skewed towards the majority of people having little issues and only a minor part at the severe end of the continuums. Furthermore, recent studies suggest that depressive symptoms form a causal network of symptoms that directly influence each other, instead of reflecting a single latent causal antecedent
We apply several pairwise measures of causality in order to gain information regarding causality between depressed moods and sleep problems in the communitybased samples of Young Finns and Wisconsin Longitudinal studies. Such measures have provided reasonable information about causality in content domains relating to physical system and sociological data; for example, within a system of variables including father’s education and occupation, number of siblings, son’s education and occupation, and son’s income, only one out of the five causal connections simultaneously estimated by the algorithm was illogical
Most testing for causality algorithms has been performed using simulated data where the ‘ground truths’ are known for certain and diverse conditions can be tested. The estimation efficiency depends on the specific algorithm and on several data parameters
First, a causality algorithm is applied to infer whether the variable Y is a weighted sum of the variable X and a residual term e (X causes Y), or vice versa. Second, assumptions of the applied causal model are evaluated. Third, a simulation study probes the model’s sensitivity for assumption violations that are difficult to evaluate directly; most importantly, the impact of the partial confounding on the algorithms ability to recognize causal association is evaluated.
Data from two separate population studies were used. First, the Cardiovascular Risk in Young Finns study is an ongoing populationbased cohort study
The original Young Finns sample consists of 3596 healthy Finnish children and adolescents derived from six birth cohorts, aged 3, 6, 9, 12, 15, and 18 years at baseline in 1980. In order to select a broadly sociodemographically representative sample, Finland was divided into five areas according to locations of university cities with a medical school (Helsinki, Kuopio, Oulu, Tampere and Turku). In each area, urban and rural boys and girls were randomly selected on the basis of their unique personal social security number. The sample has been followed subsequently in 7 data collection waves in 1983, 1986, 1989, 1992, 1997, 2001, and 2007. A detailed description of cohort can be found in an earlier publication
Data for comparisons between modified BDI (1. depression scale) and Sleep problems  
Measure (unit/range)  Study sample  Attrition sample  pvalue  
Number of participants  1699  1897  
Percentage of males  41.1 %  56.2 %  < .001  





Age of participants (years)  37.71  30–45  37.20 (n = 1897)  30–45  .002 





Sleep problems score (1–6)  2.28  1.05  2.31 (n = 463)  1.06  .543 
Depression score (1–5)  2.00  0.66  2.14 (n = 333)  0.65  < .001 







Number of participants  1687  1909  
Percentage of males  40.9%  56.2%  < .001  





Age of participants (years)  37.67  30–45  37.24 (n = 1909)  30–45  .011 





Sleep problems score (1–6)  2.27  1.04  2.34 (n = 475)  1.07  .186 
Depression score (0–3)  0.23  0.30  0.54 (n = 328)  0.64  < .001 
The Wisconsin Longitudinal Study
Data for comparisons between modified CESD (3. depression scale) and Sleep problems  
Measure (unit/range)  Study sample  Attrition sample  pvalue  
Number of participants  6640  3677  
Percentage of males  46.7 %  51.4 %  < .001  





Age of participants (years)  53.14  52–55  53.19 (n = 3084)  52–55  < .001 





Sleep problems score (1–6)  1.24  1.75  0.63 (n = 90)  1.48  .001 
Depression score (0–140)  16.40  15.44  23.31 (n = 167)  19.81  < .001 
For the empirical benchmark test, the causality between parents’ socioeconomic status and that of their offspring was estimated using pairwise measures. The SES variance was a zscore standardized sum of the zscore transformed variables measuring years of education, level of education, and gross income. More details about these variables are provided in the supplementary material (
Depression was assessed with two different versions of the Beck’s Depression Inventory (BDI). The first was a modified version in the Young Finns study, representing the second mildest symptom statement of each item of the original BDI
The second version was a slightly modified version of Beck’s Depression Inventory II (BDIII)
Sleep problems were assessed with Jenkin’s scale consisting of four items that assess: difficulties falling sleep, frequent awakenings, troubles staying asleep (including too early waking), and feelings of tiredness and exhaustion after a regular night of sleep
Depression was measured with the Center for Epidemiologic Studies Depression Scale
Sleep problems were coded with zero if the participant had answered that he or she did not have trouble sleeping in the past six months. Otherwise, it was coded as a sum of two items with a following content: ‘How often have you had trouble sleeping?’ (1 = ‘monthly or less often’; 2 = ‘about once a week’; 3 = ‘daily or more often’) and ‘How much discomfort has trouble sleeping caused you in the past six months?’ (0 = ‘none’; 1 = ‘a little’; 2 = ‘some’; 3 = ‘a lot’).
The pairwise causality estimation, as applied here, starts from the assumptions that 1) either sleep problems
With nonGaussian variables and the LiNGAM model, all we need to do in order to determine the causality is to estimate which one is the exogenous variable,
Population sampling, estimation procedures, and partial incorrectness of assumptions can introduce variability to statistical estimates. Totality of variability can be assessed by bootstrapping
After estimating the direction of causality using the three pairwise measures, we evaluated the required LiNGAM assumptions (model fit) by assessing 1) the linearity hypothesis for the estimated causal direction, 2) whether the residual distribution was nonGaussian, 3) and whether it was independent of the exogenous variable. The data was considered to exhibit a linear relationship if the quadratic term in Ordinary Least Squares regression was nonsignificant, and a scatter plot visually supported the linearityinterpretation. The errorterm distribution was considered nonGaussian if a hypothesis of Gaussian/Normal distribution was rejected by the Lilliefor’s test
In principle, the applied L^{1} test of independence assumes that there are no atoms (discreteness) in the data
Although pairwise measures of causality (or nonlinear correlations) are quite robust against measurement error
As a preliminary, a continuous probability model was estimated that closely approximated the observed datadistributions by fitting a mixture distribution of four Gaussians
First, the effects of discretization (analogous to ordinal variables) were evaluated by simulating independentvariable values,
Second, the effect of confounding was assessed, where confounding was either
Third, we demonstrated robustness against Gaussian measurement error in observations by adding a Gaussian random variable to
Finally, we wanted to obtain a crude picture regarding the relative depression severity encoded by different scales, and this was possible for mBDI and BDIII because altogether 1993 YoungFinnsStudy participants had answered to both of the scales. A Graded Response Model with the ‘logit’response function
Chosen as cause %  Summary of values  
Method/Statistic  Parents' SES  Offspring's SES  Statistic  95% confidence int. 
DirectLiNGAM 
100.00  0.00  0.1062  (0.0627, 0.1485) 
Skewbased  100.00  0.00  0.0721  (0.0454, 0.1019) 
Tanhbased  99.90  0.05  0.0077  (0.0033, 0.0124) 



DirectLiNGAM 
00.40  99.60  −0.0433  (−0.0747, −0.0090) 
DirectLiNGAM 
01.40  98.60  −0.0354  (−0.0677, 0.0001) 
Skewbased  2.80  97.20  −0.0276  (−0.0565, 0.0009) 
Tanhbased  28.50  71.50  −0.0013  (−0.0054, 0.0027) 



DirectLiNGAM 
77.65  22.35  0.0213  (−0.0332, 0.0781) 
DirectLiNGAM 
100.00  0.00  0.1633  (0.0927, 0.2572) 
Skewbased  100.00  0.00  0.0913  (0.0457, 0.1507) 
Tanhbased  65.95  34.05  0.0011  (−0.0038, 0.0058) 
Nonstandardized original variables (not available for SES).
Standardized variables; Skew and Tanhbased statistic always require standardization. Second and third column report the percentages of ‘wins’ in the indicated pairwise comparison, whereas the two last columns summarize the statistic implying the result over the 2000 resamples. SES = socioeconomic status, mBDI = modified Beck’s Depression Inventory; BDIII = Beck’s Depression Inventory II.
All three pairwise causality statistics easily recognized parents’ SES as a causal antecedent for their offspring’s SES; among the bootstrap resamples, each statistic is almost always positive (
According to
Chosen as cause %  Summary of values  
Method/Statistic  mCESD  Sleepproblems  Statistic  95%confidence int. 
DirectLiNGAM 
0.00  100.00  −0.8798  (−0.8940,−0.7940) 
DirectLiNGAM 
0.00  100.00  −0.5655  (−0.6031,−0.5185) 
Skewbased  100.00  0.00  0.0443  (0.0205,0.0730) 
Tanhbased  99.85  0.15  0.0042  (0.0013,0.0071) 
nonstandardized original variables.
standardized variables; Skew and Tanhbased statistic always require latter. Second and third column report the percentages of ‘wins’ in the indicated pairwise comparison, whereas the two last columns summarize the statistic implying the result over the 2000 resamples. mCESD = modified Center for Epidemiologic Studies Depression scale.
After deriving the DirectLiNGAM causality estimates, we assessed whether the assumed models are fitting descriptions of the data for the recognized directions of causality.
Each row shows data for a model estimated in one data set. First panel of a row (A, D, or G) shows the linear (thick line) and quadratic (thin line) fits, superimposed on the data points. Jitter (a uniform random variable ranging from −0.1 to 0.1) was added to variables to enhance visibility of data points. Second panel is a scatterplot of the linear model residual against the independent variable. Last panel of each row shows a Gaussian probability density with mean and standard deviation equaling those of the observed residual distribution, and a kernel density estimate of the observed linear model residual.
Estimated causal model  H_{0}: β_{quadratic} = 0  H_{0}: µ_{e} = Gaussian  H_{0}: µ_{X}×µ_{e}  H^{†}_{0}: µ_{X}×µ_{e} 
offspring’s SES = f(parents’ SES)+e  .013  <.001  1.05⋅10^{−8}  .002 
mBDI = f(Sleep problems)+e  .657  <.001  .079  .003 
Sleep problems = f(BDIII)+e  6.06⋅10^{−6}  <.001  1.79⋅10^{−16}  <10^{−8} 
mCESD = f(Sleep problems)+e  7.37⋅10^{−21}  <.001  <<.001  <10^{−8} 
First, visually the linear model seemed to be a fitting description when the estimated direction of causality was from Sleep problems to depressive tendencies assessed with mBDI. In the large WLS data (n = 6640; depression assessed with mCESD) the quadratic term was statistically significant (
The situation where mBDI was linearly modeled in the Young Finns data using Sleep problems as independent/predicting variable was modeled. Histograms of Sleep problems (A), Ordinary Least Squares residual of mBDI (B), and the dependent mBDI (C) are shown, together with probability distributions fitted to these data (thick lines, yaxis rescaled for the number of observations), and (Gaussian) kernel density estimates of the data (thin lines). First panel suggests that Mixture of Gaussians is not a good model for Sleep problems; a shifted Exponential distribution was chosen.
Latent confounding was a more difficult question, and algorithm performance depended a lot on the underlying distributions and type of confounding (proportional/mixture vs. linear).
The rows signify the applied causality statistic: DirectLiNGAMbased (panels A,B,C), Skewbased (D,E,F), and Tanhbased statistic (G,H,I). Two leftmost panels of each row show estimation success (proportion of correct estimates) as a function of the degree of latent confounding. Different types of confounding (linear or proportional) and different distributional conditions were tested: Gaussian mixture (GM), Exponential (Exp), and GM and Exp (different) residual, and with all GM distributions; see methods. Last panel shows estimation success when an amount of Gaussian ‘measurement error’ indicated by horizontal axis was added to independent variable.
Despite the same Young Finns data, 22.8% of participants had answered ‘no symptom’ to all BDIII items compared to only 1.5% in mBDI. For mCESD, 5.7% of participants reported the lowest attainable score. This, and the different nonlinearities with respect to Sleep problems (
Units of the horizontal axis represent standard deviations of the latent/general depression as estimated by unidimensional Graded Response Model. Information per latent depression value holds no absolute meaning; it is estimated by integral over an adjacent step in 200 point discretization of horizontal axis. In addition to (Fisher) Informationcontent of the scales, the thin dotted line plots a Gaussian kernel density estimate from the factor scores of the estimated Graded Response Model, normalized to maximum of one; this serves to illustrate which severitylevels were actually present in the populationbased Young Finns data set.
This study tested recently introduced causality estimators, that are able to estimate causality from crosssectional data
The results are partly in line with the causal implication of many studies that have found sleep problems to precede depression in time
There are two important ways for the degree of severity assessed by a scale to influence the results from LiNGAM estimates of causality. First, a measure like BDIII appears to concentrate its informative range on severe depression
It has been shown that pairwise measures of causal association are robust against measurement noise or error
The possibility of cyclic (reciprocal) causal relation between depression and sleep problems is intuitively sound and has been implicated in previously reported research
Although research toward cyclic estimation may be beneficial for the understanding of depressionsleep connection, developing robust
Regarding study limitations, it is not surprising that participants who were excluded due to lacking data had higher depression scores than the study samples (
In summary, this study provides one of the first applications of crosssectional statistical estimation of pairwise causality to a challenging realworld epidemiological problem, as opposed to simulations and benchmark testing with ‘toy problems’. A prediction is drawn from these estimates for future empirical studies to confirm: for minor forms of depression and sensitive measures, sleep problems cause significantly more dysphoria/depression than dysphoria causes sleep problems; the situation changes as depression gets more severe, or more severe levels of symptoms are evaluated. It remains unclear as to whether the dominant causality becomes reversed or is balanced for more severe depression, and study attrition appears to present an increasingly severe problem for causality estimation in increasingly severe depression. This study is another piece of evidence for the causal role of sleep problems in the populationlevel etiology of depression, in addition to their temporal precedence
(DOC)
(PDF)
(PDF)
(ZIP)
The authors gratefully thank Patrick Hoyer and Shohei Shimizu for helpful discussion and advices, and Jennifer Rowland for a language revision.