Estimating psychopathological networks: Be careful what you wish for

Network models, in which psychopathological disorders are conceptualized as a complex interplay of psychological and biological components, have become increasingly popular in the recent psychopathological literature (Borsboom, et. al., 2011). These network models often contain significant numbers of unknown parameters, yet the sample sizes available in psychological research are limited. As such, general assumptions about the true network are introduced to reduce the number of free parameters. Incorporating these assumptions, however, means that the resulting network will lead to reflect the particular structure assumed by the estimation method—a crucial and often ignored aspect of psychopathological networks. For example, observing a sparse structure and simultaneously assuming a sparse structure does not imply that the true model is, in fact, sparse. To illustrate this point, we discuss recent literature and show the effect of the assumption of sparsity in three simulation studies.

. The largest application to date of a psychological network estimated using the LASSO is the work of Boschloo and colleagues (2015), in which 120 psychiatric symptoms were measured in 34,653 subjects and modeled with an Ising Model. We use their work in this paper to illustrate our concerns regarding the interpretation of network structures that are the result of applying a network methodology to data. A sparse network model of psychopathology The network of Boschloo and colleagues (2015) shows a network structure in which symptoms representative of a disorder cluster strongly together. While they agree that the found network structure closely represents the structure that is imposed by the Diagnostic and Statistical Manual of Mental Disorders (DSM; American Psychiatric Association, 2013), they conclude that the found structure indicates that symptoms are not interchangeable, as is presumed to be the case in the DSM.
Commonly, a DSM diagnosis requires one to have X out of Y symptoms, regardless of which specific symptoms. This means that two persons with very different symptoms can be assigned the same diagnosis. This interchangeability results from an underlying causal notion of unobserved diseases causing symptoms rather than symptoms having an active causal role on each other; a notion more formally known as the common cause model (Schmittmann et al., 2013). Boschloo and colleagues conclude that the network structure shows that symptoms are not interchangeable, mainly due to found differences in number of connections and strength of connections between symptoms, a relative small number of pathways between disorders and the presence of some negative connections.
While we do not necessarily disagree with the notion that symptoms play an active, causal role in psychopathology, we wish to point out that the conclusion that symptoms are not interchangeable, is difficult to ascertain from a sparse approximated network structure alone. This is because the LASSO relies on the assumption that the true network structure is sparse; the LASSO will always search for a model in which relatively few edges and paths explain the co-occurence of all nodes. As a result, the LASSO can have a low sensitivity (not all true edges are detected) but always has a high specificity (not many false positives; van Borkulo et al., 2014). It is this reason why network analysts prefer to use the LASSO; the edges that are estimated by the LASSO can be interpreted to be meaningful, and the LASSO returns a possible explanation of the data using only few connections that can be interpreted as causal pathways (Lauritzen, 1996;Pearl, 2000). The LASSO giving a possible explanation, however, does not mean the LASSO gives the only explanation, nor does it indicate other explanations are false. In the case of Boschloo and colleagues the LASSO giving a sparse explanation can give great insight in a possible way in which psychopathological symptoms interact with each other, but merely finding a sparse structure does not mean that other explanations, e.g., a latent variable model with interchangeable symptoms, are disproved. Using the LASSO would always return a sparse structure, that is what the LASSO does.

The bet on sparsity
The LASSO is capable of retrieving the true underlying structure, but only if that true structure is sparse. Any regularization method makes the assumption that the true structure can be simplified in some way (e.g., is sparse), as otherwise too many observations are needed to estimate the network structure. This principle has been termed as the 'bet on sparsity' (Hastie, Tibshirani, & Friedman, 2001). But what if the truth is not sparse, but dense? Such a case would precisely arise if the true model were a latent common cause model in which one or several latent variables cause scores on possibly completely interchangeable indicators. This is a feasible alternative as the Ising Model can be shown to be mathematically equivalent to a certain type of latent variable model; the multidimensional item response model (MIRT; Reckase, 2009) with posterior normal distributions on the latent traits (Epskamp et al., in press;Marsman, Maris, Bechger, & Glas, 2015). The corresponding Ising Model is a lowrank network that will often be dense (all possible edges are present). Intuitively, this makes sense, as the Ising Model parameterizes conditional dependencies between items after conditioning on all other items, and no two items can be made conditionally independent if the common cause model is true. A low-rank weighted network will show indicators of a latent variable as clusters of nodes that are all connected strongly with each other. Therefore, if a common cause model is the true origin of the co-occurrences in the dataset, the corresponding Ising Model should show the indicators to cluster together, much like the results shown by Boschloo and colleagues. It is this relationship between the Ising Model and MIRT that has lead to estimating the Ising Model using a different form of regularization; by estimating a low-rank approximation of the network structure (Marsman et al., 2015). Such a structure is strikingly different than the sparse structure returned by LASSO estimation. Where the LASSO will always return many edge parameters to be exactly zero, a low-rank approximation generally estimates no edge to be exactly zero, and hence will typically return a dense network. On the other hand, this dense network is highly constrained by the eigenvector structure, leading to many edge parameters to be roughly equivalent to each other compared to the strongly varying edge parameters LASSO estimation allows. For example, the data can always be recoded such that a rank-1 approximation only has positive connections. These are key points that cannot be ignored when estimating a network structure: regardless of the true network structure that underlies the data, the LASSO will always return a sparse network structure. Similarly, a low-rank approximation will always return a dense low-rank network structure. Both methods take on the bet on sparsity in their own way, sparsity in the number of non-zero parameters, or sparsity in the number of non-zero eigenvalues; both can lose the bet.

Estimating an Ising model when the truth is dense
Here we illustrate the effect that the estimation procedure has on the resulting Ising model in two examples. First, we simulated 1,000 observations from the true models shown in Figure 1. The first model is called a Curie-Weiss model (Kac, 1968), which is fully connected and in which all edges have the same strength (here set to 0.25).
This network is a true rank-1 network, which has been shown to be equivalent to an unidimensional Rasch model (Marsman et al., 2015), in which all indicators are interchangeable. Figure 2 shows the results using three different estimation methods-sequential univariate logistic regressions for unregularized estimation (Epskamp et al., in press), the IsingFit R package (van Borkulo & Epskamp, 2014) for LASSO estimation and a rank-1 approximation (Marsman et al., 2015)-on the first n number of rows in the simulated dataset. It can be seen that the unregularized estimation shows many spurious differences in edge strength including even many negative edges. LASSO performs better, but estimates a sparse model in which edge weights vary and many edges are estimated to be exactly zero. The Rank-1 approximation works best in capturing the model, which is not surprising since the true model is a rank-1 network. The second model in Figure 1 corresponds to a sparse network in which 20% randomly chosen edge strengths are set to 0.25 and the remaining edge strengths are set to 0 (indicating no edge). As Figure 3 shows, LASSO now performs very well in capturing the true underlying structure. Since both the unregularized estimation and the rank-1 approximation estimate a dense network they have a very poor specificity (many false positive edges). In addition, the rank-1 approximation performs very poor in capturing the underlying true model structure. Thus, this example serves to show that LASSO and low-rank approximations only work well when the assumptions on the true underlying model are met. In particular, using a low-rank approximation when the truth is sparse will result in many false positives, whereas using a LASSO when the truth is dense will result in many false negatives. In addition, even when the true model is a model in which every node is interchangeable, the LASSO would still return a model in which nodes could be interpreted to not be interchangeable.
For the second example, we simulated data under the latent variable model as shown in Figure 4, using an MIRT model (Reckase, 2009). In this model, the symptoms for Dysthymia and Generalized Anxiety Disorder (GAD) were taken from the supplementary materials of Boschloo and colleagues (2015), with the exception of the GAD symptom "sleep disturbance", which we split in two: insomnia and hypersomnia. The item discriminations of each symptom were set to 1 to indicate symptoms are interchangeable and item difficulties were set to 0. All latent variables were simulated to be normally distributed with a standard deviation of 1, and the correlation between dysthymia and GAD was set to 0.55, in line with the empirically estimated comorbidity (Kessler, Chiu, Demler, & Walters, 2005). Nodes 2 and 3 in dysthymia and nodes 6 and 7 in GAD are mutually exclusive, which we modeled by adding orthogonal factors with slightly higher item discriminations of 1.1 and -1.1. Furthermore, nodes 7, 8, 9 and 10 of dysthymia are identical to nodes 6, 7, 8 and 9 of GAD respectively, which we modeled by adding orthogonal factors with item discriminations of 0.75. While these nodes appear the same, they do not correlate 1 typically because a skip structure is imposed in datasets such as the one analyzed by Boschloo and colleagues. We did not impose a skip structure to keep the simulation study simple. Such shared symptoms are termed bridge symptoms in network analysis, as they are assumed to connect the clusters of disorders and explain comorbidity (Borsboom et al., 2011;Cramer et al., 2010). In sum, the model shown in Figure 1 generates data that is plausible given the latent disease conceptualization of psychopathology. Figure 5 shows the simulated and recovered network structures. First we simulated 10,000,000 observations from this model, and estimated the corresponding Ising model using non-regularized estimation by framing the Ising Model as a loglinear model (Agresti, 1990;Epskamp et al., in press; estimation done using the IsingSampler package, Epskamp, 2014). Panel (a) shows the results, which give a  shows the result from using LASSO (using the IsingFit package; van Borkulo et al., 2014). In this model, the clustering is generally retrieved, two of the bridging connections are retrieved and one negative connection is retrieved. However, the resulting structure is much more sparse than the true model, and interpreting this structure could lead to the same conclusions as made by Boschloo and colleagues (2015): the number of connections differ across symptoms, connections strengths varied considerably across symptoms and relatively few connections connect the two disorders. Finally, panel (d) shows the result of a rank-2 approximation, which is equivalent to a two-factor model. Here, it can be seen that while a dense structure is retrieved that shows the correct clustering, violations of the clustering-the negative and bridging edges-are not retrieved.

Conclusion
Network estimation has grown increasingly popular in psychopathological research. The estimation of network structures, such as the Ising model, is a complicated problem due to the fast growing number of parameters to be estimated. is important to realize that using such an estimation method makes an assumption on the underlying true model structure: the LASSO assumes a sparse structure whereas low-rank approximation assumes a dense but low rank structure. These assumptions cannot be validated by investigating the results of the estimation methods; the LASSO always gives a sparse structure, which does not mean that the true underlying structure could not have been dense. On the other hand, low-rank approximations rarely produce sparse structures, but that does not mean that the true underlying structure could not have been sparse.  Figure 5 illustrates this point again in a plausible scenario in psychopathology, and furthermore shows that when the true network structure is complicated and neither sparse nor low-rank, as is the case here, all regularization methods fail partly even when using a relatively high sample size. As such, interpreting the sparsity of such a structure is meaningless; the LASSO resulting in a sparse model gives as little evidence for the true model being sparse, as a low-rank approximation returning a dense model gives evidence for the true model being dense.
Those characteristics from the networks we obtain are a consequence of the method used for estimating a network structure, specifically the assumptions made by the used method about the data generating network structure, pollute the resulting estimated model (Kruis & Maris, 2015).
Recently it has been demonstrated that equivalent mathematical representations exist for statistical models that assume either a common cause (latent variable) or reciprocal affect (network) relation, between an unobservable psychological phenomenon of interest and the process by which data is generated on the measures of its indicators (Epskamp et al., in press;Marsman et al., 2015;Kruis & Maris, in preparation). Consequently, when a model from one of these frameworks can sufficiently describe the associative structure of the measured variables, there exists an alternative representation from the other framework that can equally well describe the structure of the data. With respect to Boschloo and colleagues, for example, their sparse network structure resulting from the application of the LASSO to the data, can likely also be described by a multidimensional latent variable model, with a single latent variable for each clique in the network, and residual correlations.
The realization that the network structure we obtain from our data is both dependent upon the procedure that we use to estimate this structure, and can subsequently even be interpreted in theoretically very distinct ways, has an important implication.
Namely, it shows that the application of a model to data is in itself meaningless, without the attachment of some fundamental theory about the psychological phenomena we are trying to measure (Kruis & Maris, in preparation).
Our aim here is therefore not to say that one method of estimating network structures should be preferred over another, or that estimating network structures is wrong. On the contrary, network models show great promise in mapping out and visualizing relationships present in the data and are a powerful tool in grasping high dimensional multivariate relationships. In addition, network models can be powerful tools to estimate the backbones of potential causal relationships, if those relationships are assumed to exist. Using the LASSO to estimate such network structures is a powerful tool in performing fast high-dimensional model selection that results in very little false positives, and interpreting network structures obtained form the LASSO can give great insight in strong relationships present in the dataset. Our aim is to make clear that the choice of estimation method is not a trivial thing, and can greatly impact both the estimated structure as well as any conclusion drawn from that structure. In the particular cases described here, using LASSO estimation will result in a sparse structure, and using a low-rank approximation will result in a dense low-rank result.
That is what these methods do, so be careful what you wish for