Interpretable machine learning for high-dimensional trajectories of aging health

We have built a computational model for individual aging trajectories of health and survival, which contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable interaction network, where health variables are coupled by explicit pair-wise interactions within a stochastic dynamical system. Our dynamic joint interpretable network (DJIN) model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables as well as clusters of strongly connected health variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than multiple dedicated linear models for health outcomes and survival. We compare our model with flexible lower-dimensional latent-space models to explore the dimensionality required to accurately model aging health outcomes. Our DJIN model can be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given arbitrary initial health states.

We denote health variables observed at age t k by y t k , the background information at baseline by u t0 , the model health variable predictions by x(t k ), the latent variables for imputation/generation by z, the age of death or last censoring age by a, the censoring indicator by c, parameters by ✓, and variational parameters by .
To fit the model, we minimize the KL-divergence between the approximate posterior and the true posterior. This is equivalent to maximizing a lower bound to the model evidence. Starting with the model evidence, = log Z d✓dzdx 0 dt p(✓)p(z)p(x 0 |z, u t0 , t 0 )p(x(t)|x 0 , u t0 , t)p(a, c|x(t), u t0 , t) h Z a t0 p(x(t)|x 0 , u t0 , t) q(x(t)|x 0 , u t0 , t) p(a, c|x(t), u t0 , t)dt where we have introduced the approximate posteriors q. Using Jensen's Inequality we move the logarithm into the expectations, and define this lower bound as the objective function, Plugging in the normalizing flows a (l) for the posterior of z, Here we do not show the variational parameters in the notation for the approximate posteriors q and the parameters ✓ from the conditional distributions for simplicity. Additionally, we have averaged over the imputed or generated x 0 . This is the objective function used in the methods.

II. NON-RECURRENT NEURAL NETWORK MORTALITY RATE
In our network model presented in the main results, we model the mortality rate with a recurrent neural network (RNN). This allows the use of a history of health to compute the mortality rate. We have also tested a model where we instead use a feed-forward neural network taking x(t), u t0 , t as input -this allows no memory of previous states to determine mortality. We use the same layer sizes as the recurrent neural network model, and use ELU activations.

III. GENERATED SYNTHETIC POPULATION
We have made a synthetic population available at https://zenodo.org/record/4733386. This population includes 3 million individuals for each baseline age of 65, 75, and 85 years old, for a total of 9 million individuals. The background health state has been generated by sampling based on the age and sex-dependent ELSA population. For binary variables we sample a 0 or 1 based on the observed sex and age-dependent prevalence, for continuous variables we sample from a Gaussian distribution with mean and standard deviation from the observed sex and age-dependent ELSA training sample mean and standard deviation. We set all individuals with no medications.
Using this input, we sample a baseline state for each synthetic individual and simulate their health trajectories for 20 years. A. Variables used from the ELSA dataset. Background variables are only used at the first time-step, as ut 0 . Longitudinal variables are predicted in yt. All variables are z-scored; additional transformations before z-scoring are indicated.

Variable
Category Wave type Transformation Gait speed (average of 3 measurements, speed over 8 feet, age 60+) Longitudinal Self-report Dominant hand grip strength (average of 3 measurements) Longitudinal Nurse Non-dominant hand grip strength (average of 3 measurements) Longitudinal Nurse ADL score (count from 0-10, see Table B) Longitudinal Self-report IADL score (count from 0-13, see Table B) Longitudinal Self-report Time to rise from a chair 5x Longitudinal Nurse Time held leg raise (eyes open, maximum 30 secs) Longitudinal Nurse Log-scaled Time held full tandem stance (maximum 30 secs) Longitudinal Nurse Log-scaled Self-rated health (scored 0=excellent to 1=poor, 5 levels) Longitudinal Self-report Eyesight (with aids) (scored 0=excellent 1=legally blind, levels=6) Longitudinal Self-report Hearing (with aids) (scored 0=excellent to 1=poor, 5 levels) Longitudinal Self-report Walking ability score (unaided ability to walk 1/4 mile) Longitudinal Self-report (scored 0=no di culty to 1=unable to do this, 4 levels) Diastolic blood pressure ( Log-scaled Long-standing illness (yes/no) Background Self-report Long-standing illness limits activities (yes/no) Background Self-report Everything is an e↵ort lately (yes/no) Background Self-report Ever smoked (yes/no) Background Self-report Currently smoke (yes/no) Background Self-report Height Background Nurse Body mass index (weight/height 2 ) Background Nurse Mobility status Background Nurse (1=walking without help/support, 0=walking requires help/support, bed bound, wheelchair, uncertain impairment) Country of birth (UK/outside UK) Background Self-report Drink alcohol (last 12 months, scored 1=almost every day to 6=every couple of months) Background Self-report Ever had a joint replacement (yes/no) Background Self-report Ever had bone fractures (yes/no) Background Self-report Sex Background Self-report Ethnicity (white/non-white) Background Self-report Hypertension medication (yes/no) Background Self-report Anticoagulant medication (yes/no) Background Self-report Cholesterol medication (yes/no) Background Self-report Hip/knee treatment (medication or exercise, yes/no) Background Self-report Lung/asthma medication (yes/no) Background Self-report

IV. SUPPLEMENTAL FIGURES
In S1 Fig we show the variables used in the ELSA data set, and the number of individuals for which each variable is observed at each year from the time of entrance to the study. The shaded fills indicate the proportion of observed variables (with respect to the maximum of that variable), with the darkest fill indicating almost 100%. Most variables are unobserved at any given time -which reinforces the need for e↵ective baseline imputation. The full names of these variables are provided in supplemental Table A. In S4 Fig, we show the relative RMSE of our model predictions and the elastic net linear model predictions for each health variable between 1 and 6 years -plotted against the proportion of observations for which the variable is missing in the full dataset. Our model predictions are generally worse for the variables with a higher proportion missing, with observable degradation for proportions of missing around 0.95 where accuracy goes above the sample mean predictions, although our model is always better than the elastic net linear model.
In S5 Fig we show 3 di↵erent example individuals from the test set and the model predicted trajectories. We choose the 6 of the best predicted variables to show. These predictions show the estimated uncertainty for these individual trajectories, and the variety in behaviour in the data for di↵erent individuals. The relative RMSE for these individuals averaged over each time point is shown, for comparison with Fig 2 in the main results.
In S11 Fig we show the generated synthetic population Kaplan-Meier survival curve (red line and shading) and the observed population Kaplan-Meier survival curve (blue line and shading) with 95% confidence intervals indicated by the shading. The same censoring distribution seen in the observed sample is applied to the synthetic population by sampling censoring ages above the baseline age from the test data with replacement. Agreement is good until ⇠ 90 years, after which the number of individuals observed in the dataset is very low.
In S8 Fig we show the classification accuracy for a logistic regression model discriminating between the synthetic and observed samples. Our model generated a synthetic population that is almost indistinguishable from the observed sample for most individuals, only rising to 60% accuracy at 18 years from baseline.
In S9 Fig we show the one-dimensional marginal distributions for each health variable for the generated synthetic population and observed sample at baseline. We see the synthetic population agrees with the observed sample, but often has a slightly lower variance. In S10 Fig we show the mean and standard deviation of generated synthetic population trajectories (red lines and shading) and the observed sample trajectories (blue lines and shading). The synthetic trajectories have somewhat lower variance but reasonable agreement with the means.
In S14 Fig, we contrast our network model's weight matrix with a pair-wise Pearson correlation network, where weights have been pruned with a p-value above 0.01 to match the 99% credible intervals used in our approach. We see many di↵erences. Our weight matrix is much more sparse, including only the links useful for prediction. Our network is also directed and asymmetric, and one-way links between variables are observed, as well as distinct strengths of links in the di↵erent directions. However, the sign of the links in the weight matrix is typically the same as in the correlation network.
In S12 Fig we show, for each network weight, the posterior mean of the weight vs. the proportion of the approximate posterior distribution that is above zero for posterior weights, or below zero for negative weights. We exclude weights when the probability of the weight being in the opposite direction of the mean is above 1%. This approach only accepts connections with a large probability of having a definite sign. We see that large weights only have a small proportion of the posterior with the opposite sign; showing that the strong connections inferred by the model are robust.
Several alternative models were explored, as described in the Latent Variable Models section of the Methods and Supplemental Sec. II. In S7 Fig we summarize predictions for the one-dimensional summary model, in which dynamics are built on one latent summary health variable. This model performs worse than our DJIN model for both health and survival, and is often even worse than a static baseline prediction model (blue squares) for health. In S6 Fig we show model results with a full neural network drift function that includes all interactions for a 30-dimensional latent variable model, in contrast to the linear pair-wise network in our main results with the DJIN model. This shows that the full NN model only does slightly better than the pair-wise network model for health, and is slightly worse for survival. This indicates that the pair-wise network assumptions made by our DJIN model do not sacrifice much accuracy. In S2 Fig we show the model results with a feed-forward neural network for the mortality rate instead of a recurrent neural network (GRU). Our recurrent neural network (RNN) model achieves slightly better C-index and Brier scores, particularly for older ages. The models are nearly equivalent for longitudinal prediction.
In S3 Fig we show the D-calibration histogram comparison between the DJIN model and the elastic net Cox model. The histograms reflect the 2 and p-values given in the main results, showing that the both models have calibrated probabilities.