Identification of immune signatures predictive of clinical protection from malaria

Antibodies are thought to play an essential role in naturally acquired immunity to malaria. Prospective cohort studies have frequently shown how continuous exposure to the malaria parasite Plasmodium falciparum cause an accumulation of specific responses against various antigens that correlate with a decreased risk of clinical malaria episodes. However, small effect sizes and the often polymorphic nature of immunogenic parasite proteins make the robust identification of the true targets of protective immunity ambiguous. Furthermore, the degree of individual-level protection conferred by elevated responses to these antigens has not yet been explored. Here we applied a machine learning approach to identify immune signatures predictive of individual-level protection against clinical disease. We find that commonly assumed immune correlates are poor predictors of clinical protection in children. On the other hand, antibody profiles predictive of an individual’s malaria protective status can be found in data comprising responses to a large set of diverse parasite proteins. We show that this pattern emerges only after years of continuous exposure to the malaria parasite, whereas susceptibility to clinical episodes in young hosts (< 10 years) cannot be ascertained by measured antibody responses alone.


Temporary remove strongly linearly correlated features
To avoid biasing the variable importance measures computed by the random forests (Toloşi and Lengauer (2011)), highly correlated responses (above a Pearson correlation coefficient of ρ = 0.8) were temporary excluded from the analysis. Note that these correlated variables are reintroduced in the interpretation stage if any of the features they are associated with are included in the final model. rho = cor(X, method="pearson") iiRemove <-findCorrelation(rho, cutoff=0.8) # indices to remove to reduce pair-wise corr X <-X[, -iiRemove] dim(X)

Feature selection and predictive modelling
The remaining covariates undergo a rigorous supervised feature selection process, based on the mProbes (Huynh-Thu et al. (2012)) and xRF (Nguyen, Huang, and Thuy (2014)) as follows: 1. fit a large random forests considering all features; keep only the top 30% ranked features according to their variable importance measure for the subsequent steps (this step is justified because most importance scores are very low and it is therefore highly unlikely that any of these features will have sufficient predictive capacity) 2. permute the values of every predictor (i.e. antibody response), X i (i = 1 . . . M , where M is the total number of predictors), and add these to the original feature space, S X , to generate an extended feature space S X,P (P represent the permuted features), which now has the dimension N × 2M (where N is the total number of individuals)

Note:
The implementation of the predictive modelling pipeline is parallelised. However, with the current version we cannot set the same random seed for every core used, hence results will vary slightly if run again. The cross-validation folds are the same though, so the differences are small.

Visualise predictive patterns
To elucidate on the pattern that the model found to be predictive of clinical protection we plot the individual responses as a heatmap. Here we order them by age to see how this pattern emerges as individuals get older.

Protected Susceptible
Recall that initially we temporary removed strongly linearly correlated features. To help with interpreting these results we bring back any responses which were correlated to any of the selected features.

Protected Susceptible
The heatmap including all responses isn't useful because by definition we are plotting strongly correlated responses. Instead the full list of protein IDs can be queried on PlasmoDB or similar databases to investigate whether some of these responses cluster together by biological function. In the manuscript we did this by producing a word cloud for proteins that go up or down with age.