An assessment of the value of deep neural networks in genetic risk prediction for surgically relevant outcomes

Mathias Aagaard Christensen; Arnór Sigurdsson; Alexander Bonde; Simon Rasmussen; Sisse R. Ostrowski; Mads Nielsen; Martin Sillesen

doi:10.1371/journal.pone.0294368

Abstract

Introduction

Postoperative complications affect up to 15% of surgical patients constituting a major part of the overall disease burden in a modern healthcare system. While several surgical risk calculators have been developed, none have so far been shown to decrease the associated mortality and morbidity. Combining deep neural networks and genomics with the already established clinical predictors may hold promise for improvement.

Methods

The UK Biobank was utilized to build linear and deep learning models for the prediction of surgery relevant outcomes. An initial GWAS for the relevant outcomes was initially conducted to select the Single Nucleotide Polymorphisms for inclusion in the models. Model performance was assessed with Receiver Operator Characteristics of the Area Under the Curve and optimum precision and recall. Feature importance was assessed with SHapley Additive exPlanations.

Results

Models were generated for atrial fibrillation, venous thromboembolism and pneumonia as genetics only, clinical features only and a combined model. For venous thromboembolism, the ROC-AUCs were 60.1% [59.6%-60.4%], 63.4% [63.2%-63.4%] and 66.6% [66.2%-66.9%] for the linear models and 51.5% [49.4%-53.4%], 63.2% [61.2%-65.0%] and 62.6% [60.7%-64.5%] for the deep learning SNP, clinical and combined models, respectively. For atrial fibrillation, the ROC-AUCs were 60.3% [60.0%-60.4%], 78.7% [78.7%-78.7%] and 80.0% [79.9%-80.0%] for the linear models and 59.4% [58.2%-60.9%], 78.8% [77.8%-79.8%] and 79.8% [78.8%-80.9%] for the deep learning SNP, clinical and combined models, respectively. For pneumonia, the ROC-AUCs were 50.1% [49.6%-50.6%], 69.2% [69.1%-69.2%] and 68.4% [68.0%-68.5%] for the linear models and 51.0% [49.7%-52.4%], 69.7% [.5%-70.8%] and 69.7% [68.6%-70.8%] for the deep learning SNP, clinical and combined models, respectively.

Conclusion

In this report we presented linear and deep learning predictive models for surgery relevant outcomes. Overall, predictability was similar between linear and deep learning models and inclusion of genetics seemed to improve accuracy.

Citation: Christensen MA, Sigurdsson A, Bonde A, Rasmussen S, Ostrowski SR, Nielsen M, et al. (2024) An assessment of the value of deep neural networks in genetic risk prediction for surgically relevant outcomes. PLoS ONE 19(7): e0294368. https://doi.org/10.1371/journal.pone.0294368

Editor: Xiang Zhu, Penn State: The Pennsylvania State University, UNITED STATES OF AMERICA

Received: November 14, 2023; Accepted: July 1, 2024; Published: July 15, 2024

Copyright: © 2024 Christensen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data is not publicly available but can be applied for at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. Analytic methods will be made public at github.com at request. Requests to access these datasets should be directed to https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. The de-identified dataset used for this study can be obtained from the authors, provided written authorization from UK Biobank can be obtained. Authors are not allowed to share data without express permission from this governing body. The UK Biobank can be contacted at https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/contact-us. Enquiries from researchers about applying for access should be directed to the Access Management Team – email: access@ukbiobank.ac.uk. The data used is owned by the third-party UK Biobank Consortium and not by the authors. The data can be accessed in the same manner as the authors by the above mentioned information. The authors did not have any special access privileges, and anyone is able to apply for data access in the same way as the authors.

Funding: MS received a grant from the Novo Nordisk Foundation (Grant #NNF20SA0062879). https://novonordiskfonden.dk/. The funder did not play any role in the study design, data collection or analysis. The funder did not play any role in the decision to publish or in the preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Worldwide, more than 310 million surgeries are performed each year, addressing an estimated 11% of the global burden of disease [1, 2]. While most surgical patients proceed to an uneventful recovery, current estimates indicate that roughly 4% die as a direct or indirect result of surgery, while up to 15% experience a postoperative complication (PC), prolonging hospital length-of-stay with consequential morbidity [2].

While some Improvement in PCs following the implementation of approaches such as Enhanced Recovery after Surgery (ERAS) protocols have been well documented, the incidences of PCs have remained remarkably stable over the last decade [3]. As such, a stable subset of patients still experiences PCs, suggesting that this patient group could benefit from a deviation from the current one-size-fits all approach deployed by most ERAS protocols and a move towards a precision medicine approach in the surgical setting.

However, to achieve this goal, risk predictions models are, needed to identify which patients will fail standard ERAS protocols.

To this end, many risk assessment tools have been fielded to identify at-risk patients including the regression-based American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) risk calculator as well as newer machine learning approaches investigating the value of random forests or deep neural networks (DNNs) [4, 5]. However, these models are, limited by the fact that they only perform predictions on available clinical data, which only provides insight into a fraction of the driving factors that increase patients’ risks of developing PCs.

As such, recent data have suggested that genetic susceptibility could, in part, be a modifier of an individual’s risk of PCs, thus opening the potential for adding genetic data points to risk prediction models in order to improve model performance [6, 7].

Genetic variations are increasingly being recognized as an important modality for various surgical adverse events including venous thromboembolisms, renal complications and cardiac arrythmias [6, 8, 9]. However, it is currently not clear to what degree genetic susceptibility contributes to the overall risk compared with other well-known clinical risk factors. Furthermore, as genetic susceptibility may include complex non-linear effects such as newly identified complex interactions between genes that lie far from each other in the human genome, optimal modelling strategies remain unknown. As such, whether legacy risk prediction approaches such as the linear Polygenic Risk Scores (PGS), traditionally utilized to assess an overall genetic risk composition and weighted sum for the phenotype in question, could be inferior to a DNN approach, is currently unknown [10].

In this study, we sought to assess whether DNNs can outperform a classic PGS approach [11]. To illustrate this assessment, we target three high impact PCs with proven genetic susceptibility, post operative pneumonia, postoperative venous thromboembolisms (pVTE), and atrial fibrillation. Furthermore, we investigate whether single nucleotide polymorphisms (SNPs) highlighted as driving the phenotype, differ between DNN and PGS approaches, thus potentially indicating that non-linear genotype-phenotype associations can be identified by the DNN approach.

We hypothesize that DNNs will achieve superior predictive performance in predicting the genotype-associated risk of these PCs compared with a linear PGS, and that the DNN models will highlight a different subset of important SNPs compared with a linear PGS.

Methods

This study utilized genotype data from the United Kingdom biobank (UKB) consortium and adheres to the latest Polygenic Risk Score Reporting Standards [12, 13]. Access to the UKB data was approved by the consortium (Study ID #60861). Under Danish law, the study was exempt from ethical board approval due to the anonymized nature of the dataset.

We conducted a comparative study of different methodologies for genotyping risk prediction and Single Nucleotide Polymorphism (SNP)-identification in a general as well as a surgical, national cohort.

For the initial approach, we conducted standard GWAS-analyses without covariates on the chosen phenotypes with a high prevalence following surgery. Details for the GWAS are described below. These phenotypes included venous thromboembolisms (VTE), atrial fibrillation (AF) and bacterial pneumonia.

UKB has more than 500,000 individuals enrolled and consented across the United Kingdom of the age from 40 to 69. Patients were invited for participation through National Health service (NHS) registries and asked to fill surveys on basic demographic data, general lifestyle measures as well as medical history. Inclusion of all participants took place from 2006 to 2010.

Identification of cohort

All patients with available genomic data in the UKB were initially included for analysis. Cases were identified depending on the phenotype in question. For AF, VTE and pneumonia, cases were defined using relevant International Statistical Classification of Disease, 9^th revision (ICD-9) and ICD-10 codes.

The phenotypes in question were identified with the ICD-9 and ICD-10 codes listed in S1 Table. The cohorts were split into training/validation and test sets. The training/validation set consisted of all non-surgical patients and a random sample of 80% of the surgical cohort. The test set consisted of the remaining 20% of the surgical cohort. Surgery was defined with the OPCS-4 codes listed in S2 Table. The post-surgical phenotypes were defined with the same ICD-codes as above registered up to 30 days after the given procedure. For AF, only first-time diagnoses were counted as post-surgery cases. For VTE and pneumonia, any diagnoses within 30 days were counted as cases, regardless of previous history.

For each outcome of interest (pAFLI, pVTE and pneumonia, both deep learning and linear models were created using three distinct input strategies (see below for model descriptions):

A genotype only model: using only the identified SNPs (see below) as input (SNP model)
A clinical data only model: using only clinical data as input (Clinical model)
A combined model: using both SNPs and clinical data as input (Combined model)

Input SNPs were the top 100 SNPs from the discovery GWAS for each phenotype of interest, with clinical data including demographics and comorbidities (S3 Table) and combined models including both genetic and clinical data.

The majority of the individuals included are of self-reported White-British ethnic background, with only a minority being of mixed, Asian or Black self-reported ethnicity. All self-reported backgrounds were included for analysis.

Quality control

The first 50,000 individuals included in UKB were genotyped using the Applied Biosystems UK BiLEVE Axiom Array. The remaining were genotyped using the Applied Biosystems UK Biobank Axiom Array. The two array types are equal, and the differences are not of significance. The arrays interrogated 850,000 SNPs in total. To account for potential biases, patients with outlying heterozygosity rates, cryptic relatedness (PIHAT cut-off 0.2) and sex discrepancies in data were excluded. To ensure that only participants with high-quality genomic information were included for analysis, everyone with a genotyping rate of 98% or less were excluded. To ensure that only high-quality genetic variants were left for analyses, a missingness rate of 2% were used as a cut-off point. Lastly, a Minor Allele Frequency (MAF) of > 5% was used, and variants found not to be in Hardy-Weinberg equilibrium were excluded (threshold: 1 x 10⁻⁶ for both cases and controls).

GWAS

The initial GWAS-analyses were analyzed using a mixed linear model (MLM) approach. GCTA version 1.93 beta for Windows was used to conduct the analyses. The MLM-model was created using fastGWA with a sparse genetic relationship matrix (GRM) with non-imputed data from the UKB. For all the phenotypes analyzed in the respective GWAS, the 100 most significant SNPs were included in the genetic and mixed models. The choice to utilize only the top 100 SNPs was made to optimize the balance between predictive power and keeping the model computational pragmatic. SNPs are referenced using the dbSNP (rs) reference number. The cohorts were split into training/validation and test sets before, and only the training data was used for the initial GWAS-models. Relevant GWAS plots, including Manhattan and Quantile-Quantile (QQ) plots were generated using qqman (R version 4.0.2) [14]. Performance plots including the Receiver Operator Characteristics of the Area Under the Curve (ROC-AUCs), Area Under the Precision-Recall Curve (PRAUC) and heatmaps were created using Scikit-learn 1.2.1 (Python 3) [15].

Linear Polygenic risk score (PGS) modelling approach

A linear PGS was generated using the logistic regression module as implemented in scikit-learn 1.2.1 for Python 3. Models were created with both L1 (lasso) and L2 (ridge) regularization. Feature importance was determined by coefficients of the SNPs.

Deep neural network (DNN) modelling approach

All DNN models were implemented using EIR (version 0.1.25-alpha) [16]. EIR is a framework that incorporates genetic, clinical, image, sequencing, and binary data for supervised training of deep learning models. A held-out test set was used for all models to obtain a final performance after training and validation. The Cross Entropy loss was employed during training for the classification tasks. All models were trained with a batch size of 64. During training, plateau learning rate scheduling was used to reduce the learning rate by a factor of 0.2 if the validation performance had not improved for 10 steps, with a validation interval of 500 steps. Early stopping was used to terminate training when performance had not improved with a patience of 16 steps. The early stopping criterion was activated after a buffer of 2,000 iterations. All models were trained with the Adam optimizer with a weight decay of 1×10⁻⁴ and a base learning rate of 1×10⁻³ [17]. For the neural network models, we augmented the genotype input by randomly setting 40% of the SNPs as missing in the one-hot encoded array. All DNN models utilize the genome-localnet (GLN) architecture for the genotype feature extraction [16]. The same cohort splits were used as in the linear PGS-approach. Importance of features were determined using SHapley Additive exPlanations (SHAP) values [18].

Results

Cohort

We identified 488,377 patients in the UKB with available genetic and relevant phenotypic data, with 446,180 patients available for analyses after genetic quality measures were applied and were used for both the linear and deep learning modelling approaches.

For the outcomes of interest, 19,704 had a diagnosis of AF, 9,101 had a diagnosis of VTE and 13,757 had a diagnosis of pneumonia overall in the UKB. The selection of the cohort and SNPs is depicted in Fig 1.

Download:

Fig 1. Selection and quality control steps of individuals and SNPs in the UKB.

https://doi.org/10.1371/journal.pone.0294368.g001

Linear models

Atrial fibrillation.

Baseline characteristics are listed in Table 1. The SNP model reached a ROC-AUC of 60.3 [95% CI, 60.0%-60.4%]. All individuals were classified as not having AF. The PRAUC was 0.09. The clinical model reached a ROC-AUC of 78.7% [95% CI, 78.7%-78.7%] with a recall of 9% and a precision of 53%. The PRAUC was 0.25. The combined model reached a ROC-AUC of 80.0% [95% CI, 79.9%-80.0%] with a recall of 9% and a precision of 56%. The PRAUC was 0.28. All performances are depicted in Fig 2A. The SNPs and the associated genes with the highest feature importance are listed in Table 2.

Download:

Fig 2.

A: Bar plot of ROC-AUCs of all atrial fibrillation models. B: Bar plot of ROC-AUCs of all venous thromboembolism models. C: Bar plot of ROC-AUCs of all pneumonia models.

https://doi.org/10.1371/journal.pone.0294368.g002

Download:

Table 1. Baseline characteristics for atrial fibrillation cohort.

https://doi.org/10.1371/journal.pone.0294368.t001

Download:

Table 2. Table of the SNPs with the highest feature importance for the genetic, and mixed atrial fibrillation linear models.

https://doi.org/10.1371/journal.pone.0294368.t002

Venous thromboembolism.

Baseline characteristics for VTE are listed in Table 3. The SNP model reached a ROC-AUC of 60.1% [95% CI, 59.6%-60.4%]. All individuals were classified as not having VTE. The PRAUC was 0.04. The clinical model reached a ROC-AUC of 63.4% [95% CI, 63.2%-63.4%]. All individuals were classified as not having VTE. The PRAUC was 0.04. The combined model reached a ROC-AUC of 66.6% [95% CI, 66.2%-66.9%]. All individuals were classified as not having VTE. The PRAUC was 0.05. All performances are depicted in Fig 2B. The SNPs and the associated genes with the highest feature importance are listed in Table 4.

Download:

Table 3. Baseline characteristics for venous thromboembolism cohort.

https://doi.org/10.1371/journal.pone.0294368.t003

Download:

Table 4. Table of the SNPs with the highest feature importance for the genetic, and mixed venous thromboembolism linear models.

https://doi.org/10.1371/journal.pone.0294368.t004

Pneumonia.

Baseline characteristics are listed in Table 5. The SNP model reached a ROC-AUC of 50.1% [95% CI, 49.6%-50.6%]. All individuals were classified as not having pneumonia. The PRAUC was 0.04. The clinical model reached a ROC-AUC of 69.2% [95% CI, 69.1%-69.2%]. All individuals were classified as not having pneumonia. The PRAUC was 0.12. The combined model reached a ROC-AUC of 68.4% [95% CI, 68.0%-68.5%] with a recall of 0.01 and a precision of 0.5. The PRAUC was 0.11. The SNPs and the associated genes with the highest feature importance are listed in Table 6. All performances are depicted in Fig 2C.

Download:

Table 5. Baseline characteristics for pneumonia.

https://doi.org/10.1371/journal.pone.0294368.t005

Download:

Table 6. Table of the SNPs with the highest feature importance for the genetic, and mixed pneumonia linear models.

https://doi.org/10.1371/journal.pone.0294368.t006