Decision-based models of the implementation of interventions in systems of healthcare: Implementation outcomes and intervention effectiveness in complex service environments

Implementation is a crucial component for the success of interventions in health service systems, as failure to implement well can have detrimental impacts on the effectiveness of evidence-based practices. Therefore, evaluations conducted in real-world contexts should consider how interventions are implemented and sustained. However, the complexity of healthcare environments poses considerable challenges to the evaluation of interventions and the impact of implementation efforts on the effectiveness of evidence-based practices. In consequence, implementation and intervention effectiveness are often assessed separately in health services research, which prevents the direct investigation of the relationships of implementation components and effectiveness of the intervention. This article describes multilevel decision juncture models based on advances in implementation research and causal inference to study implementation in health service systems. The multilevel decision juncture model is a theory-driven systems approach that integrates structural causal models with frameworks for implementation. This integration enables investigation of interventions and their implementation within a single model that considers the causal links between levels of the system. Using a hypothetical youth mental health intervention inspired by published studies from the health service research and implementation literature, we demonstrate that such theory-based systems models enable investigations of the causal pathways between the implementation outcomes as well as their links to patient outcomes. Results from Monte Carlo simulations also highlight the benefits of structural causal models for covariate selection as consistent estimation requires only the inclusion of a minimal set of covariates. Such models are applicable to real-world context using different study designs, including longitudinal analyses which facilitates the investigation of sustainment of interventions.


A.1 Decisions and Structural Causal Models
In this section, we discuss the decision-based causal approach in greater detail. The structural approach sits at the center of the decision-based implementation system and therefore it is important to understand the underlying concepts of structural causal models and choice theory. However, the discussion here provides only a general overview of the topic as a thorough treatment cannot be covered in a single manuscript and we refer the interested reader to the references cited throughout this online supplement.
Before we describe implementation decisions as choice processes, we begin by defining a structural equation: Such an equation is deemed structural if the relationship represented by the function f(X, u) is causal, i.e., if the outcome 'y' is caused by the set of explanatory variables 'X' with an error term 'u' representing all other unobserved causes of 'y' that are independent of variables in 'X'. This follows Goldberger [1], who defined structural equations as "stochastic models in which each equation represents a causal link, rather than a mere association" (p.979). In equation (1), stochasticity refers to the presence of unobserved causes represented by 'u'.
To integrate the structural approach within a decision framework, we must examine the behavioral process leading to decision makers' observed choices [2] and thus will take some time to explain this process as it is fundamental for understanding our approach.
We assume that a person acting within the implementation system can choose from a finite set of available options. For example, a manager in an organization may choose between several implementation strategies. Two crucial but not very restrictive assumptions are that the options are mutually exclusive (i.e. choosing one option means a decision maker cannot choose an alternative) and that the set of options is complete, meaning that there are no valid options that are unaccounted for [2].
Equation (2) states that the net benefit (NBi,k) of an option is the difference between its resulted benefits (Bi,k) minus its incurred costs (Ci,k). As mentioned above, benefits and costs are not necessarily measured in monetary terms but can also include individual or alternative specific characteristics, allocated time, and non-tangible elements such as stigma. Equation (3) relates the concept of unobserved utilities to actually observed choices by defining a discrete outcome variable (L). For now, let us assume the outcome is a categorical variable with two classes represented by 0 and 1. Only the outcome variable of the option with the highest net utility will have the value of 1 while those of all other options will have the value of 0. The decision rule in equation (3) states that an actor will choose the option with the highest net utility instead of the other alternatives.
If researchers had measurements available for all the factors influencing actors' decisions, the relationships between the observed choices and their inputs would be deterministic. However, this is unlikely and therefore the relationship represented by equation (1) involves uncertainty represented by an error term. Accounting for this uncertainty, equation (2) becomes: Where , represents unobserved benefits and , denotes unobserved costs of a particular alternative in a particular choice situation (k).
Based on their knowledge or other evidence, researchers can make probabilistic statements about actors' choice processes which express the likelihood that a decision maker will pick a particular option given the observed circumstances. Many different models have been suggested in the academic literature on discrete choices [2,3].
In addition to actors' decisions, we are interested in the effects of interventions as part of our implementation system. Again, we can formulate the relationship between the outcome of interest, a treatment variable and other causes of the outcome in the form of equation (1 [4]. At the core of structural causal models or structural systems rests the assumption that the equations represent an autonomous set of mechanisms [5]. The assumption of autonomy implies that the relationships between an outcome variable 'y' and its parents 'X' in each of the equations are unaffected by how the values of 'X' are determined [6]. For example, the relationship between blood sugar level of a diabetes patient ('y') and insulin injection (a variable in set 'X') will be the same whether insulin injection (the value of a particular 'X') was determined through external manipulation (e.g. a controlled experiment) or another mechanism observed in the real world.

A.2 A Brief Introduction to Directed Acyclic Graphs
Directed acyclic graphs are a special form of causal graphs illustrating the relationships between elements of a system. As the concept of DAGs plays a central role in this study, we provide an easily accessible but informal description of the most essential concepts. More thorough treatments of causal graphs and their properties are provided by White and Lu [7] or Pearl [4].
A graph G consists of a set of nodes (or vertices) and edges that connect nodes with each other [7]. For the purpose of this study we assume that nodes in graphs represent random variables, as is standard in causal analysis. Variables included in the graph can be measured or unmeasured.
The edges in a graph represent the relationship between random variables. Directed relationships are represented by single headed arrows pointing from an initial node (direct cause) to a terminal node (outcome). As stated in section A.1, direct causes are also referred to as parents. Bi-directional relationships in a graph are represented by double headed arrows.
Relationships involving at least one unmeasured variable are denoted by dashed arrows while relationships between measured variables are usually symbolized by solid arrows [4].
Variables in the graph that have no parents (i.e., no edge leading into them) are exogenous variables in the system (or roots [7]). All other variable are endogenous variables determined within the system.
One or more edges form a path between an initial node and a terminal node. A directed graph is a graph that consists only of directed edges [7]. On a particular path, an edge connects with another edge through a common node to form a sequence. If all edges of a path are directed and each terminal node is also the initial node of the next variable, then a path is a directed path [7]. For example, in the case of a linear path, this would mean that all edges point in the same direction. Finally, if the initial node of a path is also the final node on the path, then the path is a cycle (e.g., a feedback loop). A directed acyclic graph is a directed graph without cycles.
An important concept within the causal inference literature that is directly related to DAGs are back-door paths [4], which indicate the presence of confounding variables that can distort the identification of effect of interest between two variables.
Related to the back-door path is the back-door criterion, which refers to a set of variables with certain conditions that can be used to block all back-door paths [4].

Assumptions
In this section, we provide a more detailed treatment of the hypothetical case study described in the main article. We state several assumptions that we make in the example, mainly to aide with conveying the approach described in this study.
This section is based on the scenario illustrated in causal graphs explicate all assumptions about causal relationships within a system [4] and causal interpretation can be justified under conditions such as available controls for direct causes or ignorability, as defined in the causal inference literature [4,7,8].
Before we proceed to the analysis of the model, we discuss the assumptions made in the hypothetical case example of this study in more detail: 1. We restrict our attention to the first time a patient is recorded in the system within the observation period and exclude patients' simultaneous engagement with multiple organizations. This assumption simplifies the example and is closely related to the second assumption below. Furthermore, this assumption is in line with most randomized controlled trials, making our approach comparable and more familiar to readers. Relaxation of this assumption leads to a dynamic model where treatment levels or intensity depends on previous treatment and outcomes [4,9].
2. We assume that there is no feedback of implementation or treatment outcomes to clinicians or organizations, i.e. that their information set is constant for the observation period. Such an assumption is reasonable for an early implementation setting, assuming that the intervention is delivered by outside facilitators or that the assignment period is short, relative to the duration of the program. This assumption, in combination with assumption 1, allow us to specify this model as a recursive system (i.e., a system without feedback). Relaxing this assumption would require us to specify a state-dependent dynamic model where decision makers take previous outcomes into consideration as would be the case for sustainment and CQI decision cycles. However, treatment of state-dependent dynamic models is beyond the scope of this study.
3. Furthermore, we assume that patients' outcomes are independent (e.g., no social interaction), that the implementation strategy does not alter the treatment regimen (i.e., there are no hidden treatment variations), and that treatment in this case study is defined as a binary variable taking values 0 or 1.
4. All observed variables are measured without error (e.g. X2 is a perfect proxy for the latent construct of perceived leadership). This assumption simplifies the example significantly but can be relaxed by the introduction of measurement models [9]. Such cases would require a different graphical representation that considers latent variables such as multiple indicators, multiple causes (MIMIC) models [1]. Again, treating such examples is outside the scope of this article.
In relation to treatment effects, these assumptions justify the stable unit treatment value assumption (SUTVA) that is often implicitly assumed in the experimental literature [8,10]. It is important to note that in the presented example, measured variables are available to block back-door paths in order to identify causal effects [4]. In a nonparametric framework, this situation is equivalent to matching on covariates [6].

Identification of causal effects
To illustrate the structural approach, we will formally derive effects described by three designs [11] and are also described in the main article. However, the structural causal model described here investigates the causal links between these elements across different levels of the system and these are not formally captured by hybrid designs. The benefit of structural systems is that they enable researchers to answer more complex questions that go beyond what we can possibly learn from randomized controlled studies, including the investigation of hypothetical interventions, causal paths, and mediation effects [12]. represent the overall decision to implement, the choice of implementation strategy and the decision to apply treatment to patient respectively. Dashed arrows emanating from the unobserved errors (Ui) towards two observed variables represent unobserved correlations between the variables (i.e., correlated errors between two variables, usually referred to as confounding). Each node can be viewed as having an unobserved error as input.
Following conventions in the literature unobserved errors affecting only one node are not shown in the graph [4]. Since the DAG in Fig A1 represents a structural causal model it corresponds to the following set of structural equations: Below, we will use graphical conditions to identify causal effects in Fig A1 before implementing the structural causal model within a Monte-Carlo simulation study described in section A.4. Alternatively, these effects can be derived using algebraic operations such as 'do-calculus' [4] or using hypothetical models and 'fixing' as described by Heckman and Pinto [6]. In the discussion that follows, we will treat all conditioning variables as discrete to enable the use of summation rather than integrals. This should facilitate the theoretical discussion of the identification of causal effects and is in line with the usual treatment in the literature [4,6].
Effect of implementation strategy ( 2 ) on perceived feasibility ( 4 ) and appropriateness ( 5 ) of the intervention: The estimation of the effect of D2 on X4 and X5 requires us to control for confounding as there exist unblocked back-door paths between D2 and the two dependent variables [4]. Using the back-door criterion (Theorem 3.3.2 of Pearl [4]) allows us to block any confounding paths and the average treatment effect (ATE) can be estimated as: Where ′ 2 and ′′ 2 represent two distinct realizations of D2 while the same variables with a hat symbol on top, i.e. ̂′ 2 and ̂′ ′ 2 on the right hand of equation (1), indicate the fixing of variable D2 to particular values of the variables. It is important to note that the expected values in equation (1) refer to pre-intervention probabilities [4]. Despite being direct causes of D2, W1 and W2 should not be included as covariate as this decreases efficiency [7].
Identification of the treatment effect of setting variable X5 from value x to x+1 can be achieved in the same way to equation (3).
Furthermore, by the back-door criterion [4], it is also possible to identify the effect of X4 on T1 by controlling only for Z1 and X2 and X5. Given that X2 and Z1 and X5 are all direct causes of T1, this model should be preferable from an efficiency perspective [7].
Assuming that all conditioning variables are discrete, we can estimate the average causal effect of a discrete change in perceived feasibility on treatment assignment as: It is important to note however, that the outcome variable T1 in our model varies at patient level while D2 is an organizational characteristic. Furthermore, the omission of variable X3, a caseworker characteristic, from the model will cause level-1 error terms to be correlated within caseworkers. Hence, valid statistical inference will require controlling for clustering in the sample [15]. Again, the estimation of the effect of X5 on T1 follows the same approach as equation (4).
For increased efficiency, all variables that are direct causes of the dependent variable (X3, V1, V2 and Y1) should be considered as covariates [7]. This will also eliminate any unobserved heterogeneity at cluster level in our model and, therefore, usual statistical inference should be valid.
These examples show a particular advantage of structural causal systems. The identification of causal effects can be achieved using different models, depending on data availability. The underlying assumptions are made explicit in the structural system.
In the previous paragraphs, we have discussed the incremental effects of discrete changes in Again, more efficient estimates of the AMEs for implementation outcomes can be achieved by controlling for parent variables of the dependent variable (T1), which in our example satisfies the back-door criterion [4], thus eliminating confounding.
In linear models, AMEs are equal to the estimated coefficients but the two can differ significantly in nonlinear models [16]. In section A.4, we use AMEs to estimate the causal effects of X4 and X5 and the approach is demonstrated in the do-file code. Having a broad audience in mind, we refrain from discussing marginal effects in more detail and refer interested readers to more advanced texts [16,17].

Effect of treatment ( 1 ) on parenting outcome ( 2 ):
Again, assuming that the researcher has access to the measures of the observed variables in Fig A1, the causal effect of treatment on patients' parenting scores can be identified. By the back-door criterion [4], a nonparametric estimand of this average causal effect can be obtained by controlling for the set 1 ≡ { 1 , 2 , 1 }. Similar to the previous section, we can restrict our analysis to organizations that have actually decided to implement the intervention (i.e., D1 = 1) but we do not make this condition explicit in the expressions below to simplify notation.

A.4 Monte-Carlo Simulation of the Hypothetical Example
The data generating process (DGP) described in this section is directly based on the DAG illustrated in Fig A1. To reduce the complexity of the simulation, we have not modelled the decision node D1 (the decision to implement), which introduces the implicit condition D1 = 1 as discussed in the previous section. Where available, values for regressors are guided by descriptive statistics published in meta-analyses of cognitive behavioral therapy in child and adolescent patient populations [18][19][20] as well as studies published in the implementation science literature [21]. The introduction of a complex multilevel data structure and realistic values for covariates in the model is based on the aim to demonstrate the use of decisionbased structural models in real-world contexts. Several authors have emphasized the need to conduct Monte-Carlo simulations using data sets that approximate reality as much as possible [22,23]. However, to decrease computation time, we did not introduce complex functional forms for the covariate relationships in the model (i.e., interaction terms, polynomials or fractions).
Overall the DGP generates a hierarchical dataset with different organizations that employ multiple practitioners who work with several patients. This structure is typical for community mental health settings [24]. The sample size for the simulation was set to 6000 observations (i.e. patients) and we used cluster sizes that resemble real world allied health settings [24].
Simulations were undertaken in Stata SE 14.2 with the number of Monte-Carlo replications set to R=10000. The initial seed for the simulation was based on the 6-digit sales machine authorization codes at the bottom of three receipts for purchases from different stores at different dates. A random draw conducted in Excel dictated which numbers were assigned to the seed in a random order.
Given the structure of the system, all models in the simulations were fitted using multiple linear regressions and probit regressions and inference is based on the cluster-robust variance-covariance matrix where appropriate [15]. To measure the accuracy of the structural causal approach in our Monte-Carlo simulations, we assess the relative bias for each parameter, including the 95 per cent confidence interval of this statistic. For AME and ATE estimates, relative bias is measured relative to the sample AME and ATE based on the true parameters. For linear models, this overlaps with the true parameters defined in the DGP while for probit models we calculated the AME as demonstrated in equations (5) and (6) for each sample [16]. Relative bias for the coefficients was calculated relative to the true parameters defined in the DGP. Confidence intervals for these measures are based on the empirical standard error of the statistic [22,23].
In addition, we present the coverage rates for estimated coefficients in the sections below.
Coverage reflects the number of times that the true value of the parameter was situated within the 95 per cent confidence interval of the estimate produced by the software. These confidence intervals were based on a normal approximation for probit coefficients [23] and the t distribution was used for linear regressions. Coverage rates should ideally be very close to 95 per cent for a 95 per cent confidence interval. However, non-coverage does not necessarily imply a bias in the estimated standard errors as this statistic is affected by the bias in parameter estimates, the bias in estimated standard errors and the distribution of parameter estimates [23].
As a final measure of performance of the model, we will assess the root mean squared error (RMSE) for each estimated coefficient. Similar to the relative bias statistic, this measure is based on the empirical standard error [22]. However, the standardized RMSE is a complete measure of accuracy as it spans both dimensions, bias and variability of the estimate [22,23].
Moreover, the RMSE is on the same scale as the original parameter [22,23]. For the analysis here, we have normalized the RMSE to represent the RMSE as percentage of the true value of the parameter.

Relative bias and coverage of parameter estimates
In Fig 5 of the main article, relative bias for the average marginal effects and the average treatment effects was assessed based on the true marginal effects calculated in each sample.
For linear models, these estimates are equal to the regression coefficients, as explained above.
However, for nonlinear models, the AME and ATE can differ substantially from the estimated coefficients. In general, nonlinear estimators are not unbiased in finite samples but estimates approach true values as the sample sizes increases towards infinity [16]. Reduced models (i.e., population averaged models based on the minimum conditioning set to block back-doors) generally do not yield consistent estimates for the regression coefficients despite providing consistent estimates of the AME and ATE. However, for fully specified models, we can assess consistency of the coefficient estimates and whether the inference based on these estimates is accurate. This is a result of the chosen functional specifications of causal relationships in the DGP.
The relative bias and 95 per cent confidence intervals are presented in Fig A2. For completeness, we also included the estimates for linear models, which are identical to the AME and ATE presented in Fig 5 of the main text. The results reveal that the average relative bias for each coefficient is negligible.

Fig A2. Relative parameter bias from Monte-Carlo simulations
Coefficients from probit regressions exhibit larger relative bias than linear models but these are still small in magnitude for perceived feasibility (X4=>T1 (full); 1.05 per cent) as well as appropriateness (X5=>T1 (full); 1.1 per cent). Moreover, all confidence intervals include zero.
Additional simulations using larger sample sizes (not shown) also revealed that the relative bias in probit estimates disappears as sample sizes at level-1 become large, which corroborates the finding that the estimates are not unbiased but consistent. As the sample sizes go to infinity, the asymptotic distribution therefore approximates the sample distribution and any bias in the estimated coefficients decreases towards zero.
The RMSE for each of the estimated coefficients corroborates the previous results. In Fig A3, the RMSE is normalized by the true parameter as defined in the DGP [23] and multiplied by 100 to express the statistic in per cent. Again, estimates for the effects of D2 on X4 and X5 are relatively inaccurate. However, this is due to the particular DGP for these implementation outcome measures in the simulation rather than evidence of the inefficiency of the approach itself, which can be seen in the Stata do-file code. The standardized RMSE for other coefficient estimates are within the five per cent range of the true value. Overall, the results presented here show that structural causal models based on theory are well suited to estimate the effects of implementation strategies and outcomes on each other, as well as on patient outcomes.
In addition to consistency of estimates, it is important to investigate whether statistical inference based on these estimates is valid. We therefore assess the coverage rate for each coefficient to see whether the empirical confidence intervals (using an approximation to the normal distribution for probit estimates and t distribution for linear regression coefficients) are close to the nominal confidence level of 95 per cent. represent the simulation interval, which accounts for the simulation error in coverage rates [22]. All coverage rates are very close to the 95 per cent nominal level, which indicates that statistical inference is generally valid.