Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables

We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data.


TMLE Algorithm
The efficient influence function of parameters Ψ c and Ψ b are given by respectively, where Result ?? provides the conditions under which these estimating equations have expectation zero, therefore leading to consistent, triply robust estimators. if either (Q =Q 0 and φ = φ 0 ) or (Q =Q 0 and g = g 0 ) or (g = g 0 and φ = φ 0 ).
Recall that an estimator that solves an estimating equation is consistent if the expectation of the estimating equation equals zero. As a consequence of this result, and under the conditions onQ, g and φ stated in Theorem 5.11 and 6.18 of [?], an estimator that solves the efficient influence function D is consistent if either two of the three initial estimators are consistent, and it is efficient if all of them are consistently estimated. Mathematical proofs of the efficiency of these estimators are out of the scope of this paper, but the general theory underlying their asymptotic properties can be found in [?], among others.
In order to define a targeted maximum likelihood estimator for ψ 0 , we need to define three elements: (1) A loss function L(Q) for the relevant part of the likelihood required to evaluate Ψ(P ), which in this case is Q = (Q, g, Q W ). This function must satisfy

Parametric Fluctuation
Given an initial estimator Q k n of Q 0 , with components (Q k n , g k n , Q k W,n ), we define the (k + 1)th fluctuation of Q k n as follows: , where the proportionality constants are so that the left hand side terms integrate to one, for continuous A , with D 2 and D 3 defined as in (??) and (??). We define these fluctuations using a two-dimensional with two different parameters 1 and 2 , though it is theoretically correct to define these fluctuations using any dimension for , as far as the condition D(P ) ∈< d d L{Q( )}| =0 > is satisfied, where < · > denotes linear span. The convenience of the particular choice made here is clear once the targeted maximum likelihood estimator (TMLE) is defined.

Targeted Maximum Likelihood Estimator
The TMLE is defined by the following iterative process: 1. Initialize k = 0.
2. Estimate as k n = arg min P n L{Q k n ( )}. 3. Compute Q k+1 n = Q k n ( k n ). 4. Update k = k + 1 and iterate steps 2 through 4 until convergence (i.e., until k n = 0) First of all, note that the value of 2 that minimizes the part of the loss function corresponding to the marginal distribution of W in the first step (i.e., −P n log Q 1 W,n ( 2 )) is 1 2 = 0. Therefore, the iterative estimation of only involves the estimation of 1 . The kth step estimation of 1 is obtained by minimizing P n (L Y (Q k n ( 1 )) + L A (g k n ( 1 ))), which implies solving the estimating equation where D 2 (P k n )(O) =Q k n (A + δ, 1, W ) − AQ k n (a + δ, 1, W )g k n (a|1, W ) dµ(a).
The TMLE of ψ 0 is defined as ψ n ≡ lim k→∞ Ψ(P k n ), assuming this limit exists. In practice, the iteration process is carried out until convergence in the values of k is achieved, and an estimator Q * n is obtained. Under the conditions of Theorem 2.3 of [?], a conservative estimator of the variance of ψ n is given by 1 n n i=1 D 2 (Q * n , Q W,n , g * n , φ n )(O i ).