Use of deep learning methods to translate drug-induced gene expression changes from rat to human primary hepatocytes

In clinical trials, animal and cell line models are often used to evaluate the potential toxic effects of a novel compound or candidate drug before progressing to human trials. However, relating the results of animal and in vitro model exposures to relevant clinical outcomes in the human in vivo system still proves challenging, relying on often putative orthologs. In recent years, multiple studies have demonstrated that the repeated dose rodent bioassay, the current gold standard in the field, lacks sufficient sensitivity and specificity in predicting toxic effects of pharmaceuticals in humans. In this study, we evaluate the potential of deep learning techniques to translate the pattern of gene expression measured following an exposure in rodents to humans, circumventing the current reliance on orthologs, and also from in vitro to in vivo experimental designs. Of the applied deep learning architectures applied in this study the convolutional neural network (CNN) and a deep artificial neural network with bottleneck architecture significantly outperform classical machine learning techniques in predicting the time series of gene expression in primary human hepatocytes given a measured time series of gene expression from primary rat hepatocytes following exposure in vitro to a previously unseen compound across multiple toxicologically relevant gene sets. With a reduction in average mean absolute error across 76 genes that have been shown to be predictive for identifying carcinogenicity from 0.0172 for a random regression forest to 0.0166 for the CNN model (p < 0.05). These deep learning architecture also perform well when applied to predict time series of in vivo gene expression given measured time series of in vitro gene expression for rats.

: Error calculation using mean absolute error for raw normalised gene expression values.
The above figure depicts two model predictions for a measured time series of human in vitro gene expression (blue). The gene expression values for model prediction 1 (green) are very close in magnitude to the measured time series of human in vitro gene expression, however, the gene expression pattern of Model prediction 1 differs greatly to the measure human in vitro gene expression pattern. Model prediction 2 (yellow) has the same gene expression pattern as the time series measured human in vitro gene expression, but it is shifted upwards. Evaluating the quality of the model predictions using a classical distance based cost function, such as sum of absolute errors, on the raw gene expression level values we see that model prediction 1 would have a much lower error value despite having such a different pattern in gene expression. Despite model prediction 2 predicting the correct gene expression pattern the penalty for incorrectly predicting the gene expression level at the first time point is also added to each subsequent time point. The gene expression data is re-encoded, the first entry remains unchanged, subsequent entries now encode the slope, change in gene expression levels, between consecutive time points. Applying the same error calculation as before (mean absolute error) we see that while model prediction 2 is penalized for incorrectly predicting the gene expression value for time point one as it correctly predicts the changes in gene expression between time point one and two, and time point two and three, it receives no further penalty. While model prediction 1 is closer in predicting the level of gene expression for time point one, and there receives a lower penalty, as if fails to predict the correct changes in gene expression level for subsequent time point it has a greater error over all. In this way, model predictions are not unduly penalized for incorrectly predicting a single time point. By re-encoding the data in this manner the models are training to correctly predict the gene expression pattern without the need to introduce a more complex error function.

Supplementary material Section 2: additional analyses
Nested sets of non-orthologus genesrat in vitro to rat in vivo.
As with the human in vitro predictions, as the models are trained on larger gene sets the average mean absolute error decreases for all models. The average mean absolute error values are greater for the rat in vivo predictions than for the human in vitro predictions. Higher average mean absolute error values for the rat in vivo predictions were also observed for the toxicologically relevant gene sets identified from literature. Each model included in the analysis (CNN, naïve encoder, modified autoencoder, and random regression forest) were trained to predict rat in vivo gene expression from rat in vitro gene expression on a population of randomly selected nested gene sets of increasing size (20, 35, 50, 60, 80 genes). The figure depicts the mean average mean absolute error for each model trained on a population of ten randomly generated non-orthologous gene sets of each size. The error bars indicate the standard error of the mean.
Nested sets of known rat-human orthologsrat in vitro to human in vitro.
Orthologs are two, or more, homologous gene sequences found in different species related by linear decent. Ortholgs are commonly utilized to relate results from rodent in vivo and in vitro bio assays to the human system. In order to evaluate any added benefit of using known orthologs when predicting times series of human in vitro gene expression given a time series gene expression the models were also applied to randomly selected nested gene sets of increasing size of known rat human orthologs. The range of average mean absolute error values for predicting human in vitro gene expression from rat in vitro gene expression using known rat-human orthologs( Figure S6 below) shows no improvement over using randomly generated non-orthologous gene sets (Figure 8 in the text) This indicates there in no advantage in restricting gene predictions to known rat-human orthologs. This is unsurprising as all three deep learning models implemented in this system contain a bottleneck in their architecture. As a result, model predictions of a human gene expression pattern are made by a non-linear combination of the input rat gene expression patterns. As with the randomly selected non-orthologous gene sets, as more genes are included the average mean absolute error decreases for all models. Again, the convolutional neural network consistently out preforms the random regression forest, our benchmarking classical machine learning method.  Overview of gene sets included in these analyses  SLCO1B1  CYP7A1  CYP8B1  CYP27A1  CYP7B1  NR1H4  NR0B2  N21L2  NR1L3  FGF19  ABCB11  SLC51A  ABCC3  UGT2B4 CYP3A4 SULT2A1   FABP4  ACACA  AKT1  AKT2  AKT3  PRKAA1  PRKAA2  ADIPOR1  ADIPOR2  ADIPOQ  BCL2A1  CPT2  CPT1A  CPT1C  CASP8  MLXIPL  FABP5  ELOVL3  FAS  FOXO1  NR1H4  RXRA  FASLG  FABP3  FABP7  PMP2  GCKR  IL1A  IL10  IRS1  IRS2  MAPK10  NFKB1  NFKB2  RELA  RELB  PPARA  PPARG  PTEN  RXRB  RXRG  SCD  SOCS3  SREBF1  TGFB1  TGFB2  TGFB3  TLR4  MTOR PNPLA3 The steatosis gene set was generated in house combining a literature search using the search terms "liver steatosis", "Nonalcoholic fatty liver disease", and "NAFLD", the steatosis pathway from KEGG (hsa04932) and the steatosis adverse outcome pathway from Wikipathways. In addition genes were filtered to include only known human -rat orthologs measured by both the Rat Genome 230 2.0 Array and the Human Genome U133 Plus 2.0 Array used in this study. This gave rise to a seed gene set of 45 genes, these were then used an input for input for MetaCore (version 6.30, build 68780, accessed on 9th of May 2017) to generate a fully connected gene interaction network. Dijkstra's shortest path algorithm is used to construct the network with allowing one node to be added if necessary.