Predicting protein cascade expression from H&E images

Alejandro Leyva; Abdul Rehman Akbar; Muhammad Khalid Khan Niazi

doi:10.1371/journal.pcbi.1014262

Abstract

Protein expression within oncogenic or suppressive pathways is a hallmark indicator of oncogenesis. While traditional AI models in digital pathology attempt to predict singular proteins, there is a need to predict the downstream expression of proteins to indicate the propagation of signals. RNA expression provides novel information, but does not provide information about the downstream propagation of protein signals or whether those signals are functional. Using Reverse Phase Protein Array (RPPA) data with whole-slide images (WSIs) from the publicly available Cancer Genome Atlas Breast Adenocarcinoma dataset (TCGA-BRCA), we predict the expression of five key proteins identified from the apoptosis cascade, using DNA damage and repair (DDR) cascades as a biological control. Furthermore, we examine the performance of patch- level Vision Transformers (ViT) on the regression task, which was tested against the designed cellular-level ViT, CellRPPA. Our results demonstrate that patch-level vision transformers were unable to obtain statistically significant predictive results, achieving R-squared values < 0.1 for all folds. In addition, CellViT obtained R-squared values >0.1 in all five test folds. We also show that morphologically indicative cascades, such as the apoptosis cascade, provide significantly higher performance compared to the DDR cascade.

Author summary

We developed a method to estimate how groups of proteins behave inside tumors using standard tissue images that are routinely collected in clinical care. Rather than focusing on single molecules, we asked whether it is possible to capture broader biological processes such as how cells respond to stress or initiate programmed cell death directly from visual patterns in tissue structure. To do this, we trained a computational model to learn relationships between image features and protein measurements from the same tumors.

We found that certain biological processes leave detectable signatures in tissue architec- ture, allowing the model to partially recover protein activity from images alone. However, this relationship is not complete, and much of the variation remains unexplained, highlighting the complexity of tumor biology.

Our work suggests that widely available pathology images may contain underused in- formation about underlying molecular processes. In the future, approaches like this could help provide additional biological insight in settings where direct molecular measurements are unavailable, supporting more informed research and potentially guiding clinical decision making.

Citation: Leyva A, Akbar AR, Niazi MKK (2026) Predicting protein cascade expression from H&E images. PLoS Comput Biol 22(5): e1014262. https://doi.org/10.1371/journal.pcbi.1014262

Editor: Sophia Rudorf, Leibniz University Hanover, GERMANY

Received: January 30, 2026; Accepted: April 22, 2026; Published: May 4, 2026

Copyright: © 2026 Leyva et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Datasets and diagnostic slides are publically available and anonymized: CellEcoNet gihub is available here: https://github.com/AI4Path-Lab/PathRosetta. CellRPPA and preprocessing code is available here: https://github.com/Alejandro21236/CellRPPA. Please see the zenodo link here: https://zenodo.org/records/19077131. Here is the link to TCGABRCA, the openly available public dataset https://portal.gdc.cancer.gov/projects/ TCGA-BRCA.

Funding: The project described was supported in part by R01 CA276301 (PIs: MKKN and Dr. Wei Chen) from the National Cancer Institute, Pelatonia under IRP CC13702 (PIs: MKKN, Dr. Arya Mariam Roy, and Dr. Anna Vilgelm), The Ohio State University Department of Pathology and Comprehensive Cancer Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or National Institutes of Health or The Ohio State University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Breast adenocarcinoma is a highly documented disease and can be associated with a variety of genetic mutations that result in the propagation or suppression of protein signals [1]. Path- ways can be intrinsically mediated, whereby the stimulus resulting in signal transduction is intracellular, or extrinsic. Within adenocarcinoma, common mutations in proteins such as TP53 result in the suppression of apoptotic signals by mutating the DNA-binding domain, resulting in dysfunctional proteins [2]. Other mutations, such as those in PIK3CA or MYC, result in the enrichment and propagation of the PIK3CT/AKT pathways by mutating ki- nase domains, leading to the overexpression of these proteins [3]. Apoptosis is responsible for regulating programmed cell death and is characterized by shrinkage, pyknosis, and reorganization of lipid structure [4]. The pathway is both intrinsically and extrinsically mediated by a combination of enzymes, receptors, and transcription factors [5]. Intrinsically mediated apoptosis occurs in two categorical fashions: negative, which is due to the absence of growth factors around a structure resulting in the triggering of cell death, or positive, which may be due to the presence of antigens, radiation, or hypoxia [6]. These changes result in the opening of inner mitochondrial pores due to lost membrane potential, preventing regular cell metabolism and causing the activation of BH3 proteins that detect metabolic stress [7]. As a result, BH3 proteins deactivate anti-apoptotic proteins such as BCL-2 and begin to activate apoptotic proteins such as BAK/BAX, resulting in the transduction of signals to XIAP (X-linked inhibitor of apoptosis protein), which then disinhibits the caspase family. The released caspases then catabolize the cell, resulting in cellular death [8].

The extrinsic pathway performs a similar function but is dependent on ligand–receptor binding of the TRAIL/FasL proteins, which results in the activation of caspases 8 and 10, which then activate the same caspases up to XIAP [9]. Common forms of extrinsic apoptosis include T-cell–mediated apoptosis and immune response [10].

The DNA damage and repair cascade is responsible for the repair of DNA in response to double-strand breaking or transcriptional errors. It has been demonstrated that lower

expression of the DDR cascade is prognostically predictive within breast adenocarcinoma and has been predictive of chemotherapy sensitivity and response [11]. DDR applications within clinical oncology include testing for mutations in genes responsible for double-strand breaking repair, BRCA1 and BRCA2 (Breast Cancer Gene 1/2), and can be secondarily characterized by RNA expression. However, the choice of genes to use for prognostic value is still debated [12]. Moreover, the expression of DDR genes is not physically visible via an electron microscope but can indirectly result in abnormal cellular growth or polyploid cells [13]. Genes that are often used to characterize the DDR cascade include ATM, CHEK2, H2AFX, RAD51, and TP53, among others. The ataxia–telangiectasia mutated gene (ATM) encodes the response to double-strand breaking and senses DNA double-strand breaks [14]. RAD51 (radiation-sensitive protein 51) is responsible for the invasion of homologous DNA zones to allow for accurate and timely DNA repair and behaves as an ATPase [15]. CHEK2 (checkpoint kinase 2) is a protein responsible for producing a kinase that cleaves DNA regions and is a radiation-sensitive protein that prevents the cell from entering mitosis upon DNA damage [16]. CHEK2 has been shown to be a predictive biomarker for multiple organ cancers across Europe and North America [17]. TP53 is a transcription factor responsible for encoding the p53 antigen, as well as a protein that activates or inhibits apoptotic genes such as NOXA [18]. TP53 (tumor suppressor antigen 53) typically functions in multiple roles to suppress proliferative pathways and regulate cell cycling. H2AFX is responsible for regulating nucleosome formation and produces phosphorylative foci to recruit repair factors for double-strand breaks; the H2A family is also responsible for regulating chromatin accessibility and replication timing [19].

These cascades and their expression are typically predicted using RNA expression in the field of bioinformatics [20]. Within the field of digital pathology, gene expression models and molecular subtyping models have been developed using novel deep learning methods, whereby gene expression can be predicted from whole-slide image features [21]. More recently, multi- modal models integrating protein and genetic data are being developed, complementary to spatial transcriptomics and proteomics [22]. While RNA provides a perspective on cascade expression and prognosis, it does not indicate whether genes are translated and produced, or whether the proteins produced are functional [23]. Traditional models in the field have attempted to predict the expression of singular proteins or antigens that can be readily indi- cated from histology, including HER2 expression and EGFR [24,25]; however, most models do not predict intracellular protein expression due to morphological ambiguity and poor generalization across cancers and external datasets. Multimodal models and comprehensive information on protein expression are required to understand protein functionalization for improved prognostic prediction. Information from proteomics contextualizes cellular behvav- ior and patterns in gene expression in systems biology, as cascades are highly interconnected and cross-talk mechanisms of action [26]. Misalignment between the extent of gene expres- sion and the presence of proteins indicates translational or transcriptional dysfunction or deliberate inhibition.

Protein data are gathered from Reverse Phase Protein Array, which uses fluorescent antibodies that bind to the protein of interest [27]. Luminescence reflects antibody-based detection of target protein levels, while absorbance is used to quantify total protein loading for normalization. The sensitivity and accuracy of RPPA depend on the affinity, specificity, and availability of proteins to bind to the provided antibodies and are conditionally accurate. Typically, multimodal analyses between RNA and RPPA within public datasets are not performed due to the time differential at which each assay is performed, rendering direct comparison between protein and RNA expression invalid. When using AI for expression prediction, information is extracted from region/patch level of whole slide images [28]. In recent years, higher resolution image embeddings have been developed to examine Images at the cellular level, and are untested for protein prediction [29]. Since Protein Cascades are generally visible at the cellular level, there is a need to investigate the ability of AI at both the patch level and the cell level to predict the expression of multiple proteins.

In this study, we predict the expression of multiple intracellular proteins in aggregate as a regression task from WSIs. We present a comprehensive algorithm, CellRPPA, inspired by CellEcoNet [30], to compete with conventional patch-level algorithms. This study offers two novel investigations: i) differences between cell-level and patch-level resolution in deep learning for protein expression prediction, and ii) the capacity for deep learning to predict morphologically ambiguous proteins, or proteins that are loosely indicated by histology. As digital pathology continues to expand, there is an ever-growing need for multimodal models and proteomics to provide a comprehensive perspective on disease progression.

2 Materials and methods

From the publicly available Cancer Genome Atlas (TCGA) Breast Adenocarcinoma dataset (TCGA-BRCA), 919 RPPA samples were paired with whole-slide images. RPPA measure- ments were defined at the case level, yielding a 1:1 correspondence between each patient and a single protein cascade score. Multiple WSIs per case were included during training as independent inputs, and slide-level predictions were aggregated to obtain a single case- level estimate. Group-wise cross-validation ensured that all slides from a given patient were assigned to the same fold, preventing leakage. Cascades were defined using proteins. Data splitting was performed at the patient level to prevent leakage between training and test sets, and no samples were skipped or had missing information. The apoptosis cascade was defined as the expression of BCL2, BAX, XIAP, and cleaved caspases 3 and 7. This was done to represent the activation of both intrinsic pathways demonstrated by BCL and BAX, as well as the expression of the encompassing caspases and regulatory inhibitors. The DDR cascade was used as a morphologically ambiguous protein cascade to compare against the performance of apoptosis, which is microscopically noticeable. The proteins chosen to rep- resent the DDR cascade include H2AFX, CHEK2, TP53, TP53 BP1, and ATM; however, this is acknowledged to be a limited representation of the DDR cascade and was chosen to represent hallmark genes associated with prognostic value in breast adenocarcinoma.

The RPPA scores for the relative intensity of each protein were then summed and z-scored across each protein. If any protein that was part of the cascade was not included within a sample, the sample was excluded entirely. Cascade scores were defined as the sum of z- scored protein abundances to provide a compact proxy for coordinated pathway-associated protein expression. This formulation was not intended to model regulatory directionality or mechanistic pathway activity (e.g., opposing pro- and anti-apoptotic effects), but rather to capture aggregate multivariate protein signal as a supervised learning target. We acknowl- edge that this approach does not resolve antagonistic relationships among pathway members. A directionality-aware formulation, such as signed weighting or graph-based aggregation of protein interactions, represents an important future extension to more faithfully model reg- ulatory dynamics. The CellRPPA model is CellEcoNet reengineered with a regression head, and the GitHub repository for the model is included for both CellEcoNet and CellRPPA. Cell embeddings are derived from localized image regions, but CellRPPA is not simply a higher-resolution patch model; it integrates cell-level features within a structured hierarchi- cal aggregation that captures both local and contextual information. The patch-level ViT baseline already tests patch-only input under the same task and performs worse, indicating that the improvement is not explained by resolution alone. Cell embeddings were extracted using Trident and CellViT++, at 20 × magnification, and cell types were not determined.

The apoptosis and DDR tasks were performed separately, and training and evaluation took 150 hours for each task. The patch-level ViT uses a standard ViT-S/16 with cross-fold validation, where the DDR and apoptosis tasks were tested and validated separately and were not backpropagated jointly. We selected a standard ViT-S/16 baseline to isolate the effect of resolution and aggregation strategy rather than architectural optimization. Implementation parameters are shown in Table 1 for the S/16 vision transformer.

Download:

Table 1. Training configuration and hyper parameters used for regression.

https://doi.org/10.1371/journal.pcbi.1014262.t001

All parameterizations for both models are provided within anonymized shell scripts and should be able to run, provided that the necessary label files are available. Fig 1 shows the workflow for the study design, starting from the development of ground truth through the evaluation of each model. All computational experiments were performed on the Ohio Supercomputer on Nvidia A100 GPUs. All analyses were performed on the entire cohort, and the RPPA analytics were derived from the RPPA CSV using matplotlib, NumPy, and SciPy.

Download:

Fig 1. Overview of the cascade prediction framework.

Whole-slide H&E images and RPPA-derived protein cascade scores serve as inputs. Patch-level modeling uses a Vision Transformer (ViT) to extract tile embeddings and regress pathway-level protein activity, while cell-level modeling performs cell segmentation, embedding, and graph construction to predict apoptosis activity at cellular resolution. The outputs illustrate spatially resolved pathway activity maps at both patch and cell scales, enabling comparison between coarse tissue-level predictions and fine-grained cellular cascade expression.

https://doi.org/10.1371/journal.pcbi.1014262.g001

3 Results

The labels for the protein scores were assigned by case, and each case was assigned into fold-wise training, testing, and validation cohorts for evaluation over 100 epochs per fold. Initial analysis presents results on the expression of proteins defined within each sample as aggregated scores, as shown in Fig 2. Fig 2A shows the correlation heatmap of protein abundance co-variation across each cascade. Proteins included within the apoptosis cascade exhibit moderate coordinated abundance among their counterpart proteins, consistent with shared cascade-level behavior.BAX and XIAP show very low co-variation, which is biolog- ically plausible given their opposing regulatory roles. Interestingly, BCL2 and TP53 BP1 show stronger abundance correlation, exceeding 50% Pearson correlation, while TP53 BP1 shows negative abundance correlation with other DDR-associated proteins within its defined cascade. TP53 BP1 also shows a strong correlation with XIAP abundance while exhibit- ing low co-variation with BAX. BCL2 and BAX show low abundance correlation, which is consistent with the known biological roles of these proteins.

Download:

Fig 2. Proteomics and label-level analytics for apoptosis and DDR RPPA-derived path- way scores.

Top row: inter-marker correlation structure and pathway score distributions. Bottom row: low-dimensional structure of pathway scores and cross-cascade correlation (con- trol relationship between DDR and apoptosis). (A) Marker correlation heatmap (z-scored RPPA), (B) apoptosis score distribution, (C) DDR score distribution, (D) PCA of pathway scores colored by apoptosis, and (E) DDR versus apoptosis pathway scores.

https://doi.org/10.1371/journal.pcbi.1014262.g002

Within the DDR cascade, TP53 BP1 shows lower co-variation with DDR-associated pro- teins, including ATM, CHEK2, and H2AFX. The abundance correlation between ATM and all other DDR proteins, except TP53 BP1, is remarkably strong, exceeding 80% with H2AFX and 50% with CHEK2. The correlation heatmap shows little overlap between DDR and apoptosis cascade abundance patterns and often demonstrates negative cross-cascade cor- relations. There is a small positive correlation between BAX and ATM, while most other cross-cascade correlations are below zero.

Fig 2B shows the standard Gaussian distribution of apoptosis pathway scores, with no observable skew and larger variance from the median. Fig 2C shows the DDR score distribution, which maintains a standard Gaussian distribution with outliers that are three standard deviations away from the mean, attributed to the absence of expression of any classified DDR gene. Fig 2D shows the PCA of pathway scores using UMAP projections for each sample, where samples that are closer together have similar protein expression profiles. A higher frequency of samples cluster toward minimal variance, while outliers, or cases with higher apoptosis cascade scores, are more dispersed and exhibit greater variance in protein expression, or markedly higher expression of proteins that are typically less expressed. To visualize the relationship between DDR scores and apoptosis, Fig 2E shows a plot that classifies samples by their respective scores. In general, samples with higher apoptosis scores tend to have lower DDR scores, and vice versa. However, a large proportion of samples with apoptosis scores near zero also have DDR scores near zero, suggesting an inverse relationship.

The results for the patch-level ViT on the morphologically ambiguous DDR cascade, used as a control, are shown in Table 2. The model failed to obtain statistically significant correlations in three folds using False Discovery Rate (FDR) (p < 0.05) correction on the Spearman correlation for non-linear correlational analysis. Fold 2 completely failed to obtain any Pearson or Spearman correlation as a result of convergence, while the Mean Absolute Error (MAE) remained relatively high. Mean Squared Error (MSE) did not consistently decline across folds, and the MAE remained stagnant across all folds. The patch-level ViT is shown to predict close to no variance in protein expression within the DDR cohort across all five folds. Variability across folds likely reflects cohort heterogeneity and limited sample size. In addition, the Spearman and Pearson values were nearly identical in some folds, indicating similarity in morphological correlation.

Download:

Table 2. Performance metrics for DDR pathway prediction across folds. Erroneous values or failed convergences are reported as NaN for all tables.

https://doi.org/10.1371/journal.pcbi.1014262.t002

The results for the patch-level ViT’s prediction of apoptosis cascade expression are shown in Table 3. The results are noticeably worse for apoptosis, with only one of the five folds showing significant correlations and three of the five folds showing no correlation at all. The MAE also increased relative to the DDR cascade prediction, exceeding 0.5 in some folds, while the typical range of values is between 0 and 1. The MSE did not remain steady and did not consistently improve across folds; Fold 1 demonstrated the best MSE, which then steadily worsened. The model failed to explain any variance in protein expression and, in Fold 3, produced slightly negative values, demonstrating inadequate capability to predict apoptosis cascade expression, despite apoptosis being morphologically visible and the known correlation between protein expression and cascade expression.

Download:

Table 3. Performance metrics for apoptosis pathway prediction across folds. Undefined correlation values are reported as NaN.

https://doi.org/10.1371/journal.pcbi.1014262.t003

The results for CellRPPA’s prediction of the apoptosis cascade are shown in Table 4, demonstrating improvement over the patch-level ViT, which indicates that higher resolution can be advantageous for the task. For each fold, the epoch with the lowest validation loss was chosen to represent performance across folds. All folds explained at least 10% of the variance in the protein prediction task, while a substantial portion of folds in both the test and validation sets exceeded 20 or 30%. While the MAE remains consistent across cohorts, there is controlled variance across folds. The Pearson and Spearman correlation coefficients across all cohorts were above 40%, with the exception of one fold in the test set. The MSE remained consistent relative to MAE values across cohorts and folds, ranging between 29–35, with the exception of two instances in the test set.

Download:

Table 4. Cell-level apoptosis prediction performance across five-fold cross-validation. Re-ported metrics include Pearson correlation, Spearman correlation, mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (R²).

https://doi.org/10.1371/journal.pcbi.1014262.t004

Table 5 shows the prediction performance for the DDR cascade using CellRPPA, which demonstrated worse values in comparison to apoptosis prediction performance. Relative to the patch-level tasks, CellRPPA showed no improvement over the patch-level ViT in terms of explained variance. Similar to the patch-level ViT, the Spearman and Pearson correlation coefficients fall within the same range, and the PCC values are similar, if not equal, to the Spearman coefficients. The MAE remained within a consistent range, as observed in the other predictive tasks, and was relatively stable across folds but did not exhibit a noticeable decrease.

Download:

Table 5. Cell-level DNA damage response (DDR) prediction performance across five-fold cross-validation. Metrics include Pearson correlation, Spearman correlation, mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (R²).

https://doi.org/10.1371/journal.pcbi.1014262.t005

The results demonstrate the failure of patch-level ViTs on multi-protein prediction for apoptosis and DDR. In contrast, for cell-level ViTs (CellRPPA), the model succeeds in pre-

dicting apoptosis but fails on DDR, reflecting the histological ambiguity of these proteins. Validation of protein choices for both DDR and apoptosis demonstrates that the selected proteins behaved within the same cascade and function and also exhibited noticeable co- expression. It is also noted that samples with lower DDR expression had higher apoptosis protein expression as measured by RPPA. Across folds, CellRPPA demonstrated improved predictive performance for apoptosis (R2 = 0.189 ± 0.054) compared to the patch-level ViT (R2 = −0.009 ± 0.015). In contrast, both models failed to explain variance in the DDR task (patch-level: R2 = 0.005 ± 0.019; CellRPPA: R2 = −0.070 ± 0.105), consistent with the morphological ambiguity of DDR-associated proteins.

4 Discussion

Analysis of breast adenocarcinoma suggests that, in general, there is an inverse relationship between DDR expression and apoptosis expression, which is validated by the correlation heatmap shown in Fig 2A, indicating that a functional basis exists within breast adeno- carcinoma. Previous studies indicate that the apoptosis and DNA damage response cascades are tightly coupled rather than independent. DDR signaling functions upstream to assess genomic damage and, when repair fails, promotes apoptotic commitment. Following irre- versible activation of executioner caspases, key DDR components are cleaved, effectively terminating DNA repair processes [32]. While a few outliers exist in each cohort, the anal- ysis suggests that most proteins exhibit similar levels of expression within the cohort. We acknowledge that this formulation does not account for opposing regulatory roles within pathways. Future work will incorporate directionality-aware weighting or graph-based ag- gregation to better reflect pathway dynamics.

Apoptosis is well known to be visible under compound or fluorescent microscopy, and it is notable that the patch-level ViT fails to detect and predict cascade expression and performs markedly worse than on the DDR cascade. Performance on the DDR cascade in both trials demonstrates that prediction is not suitable at either resolution, likely due to the inherent morphological ambiguity of DDR expression in histology. DDR expression may reflect replicative stress, aneuploidy, and other phenomena that can manifest in multiple visible forms across different cell types. In contrast, apoptosis has been observed to exhibit a set of well-characterized behaviors that are similar across most cell types. It should be noted that patch-level ViT does not provide sufficient resolution to statistically learn the patterns that characterize apoptosis. These results suggest that cascade prediction is better for proteins with histologically visible behaviors and that higher-resolution representations can be advantageous for such cascades. Future work will extend this model to additional pathways (e.g., Reactome) to assess pathway-specific predictability.

Since cellular models are proven to be capable of predicting aggregate protein activity, it makes sense to move toward modeling co-regulation and interactions within protein cas- cades in a spatially resolved way for better explainability and predictive power. With the rise of spatial transcriptomics in digital pathology, interactions between spatial proteomics and transcriptomics can now be modeled at the cellular level using platforms like Xenium and Orion without temporal mismatch between measurements [33,34]. Prior work has ex- plored gene expression prediction using graph-based neural networks, where gene–gene in- teractions are modeled through distance-aware edges and weighted connectivity [35]. More importantly, transcriptomics has been used to predict proteomics through deep learning, and given the underlying biological coupling, the inverse direction, predicting transcrip- tomic states from proteomic signals, is also a reasonable extension [36]. Other approaches include contrastive learning across spatial transcriptomic images to capture differences in tumor microenvironments and cellular composition, enabling gene expression inference from contextual variation [37]. Cellular-level models provide the resolution needed to compare mi- croenvironments within the same sample while also capturing transcriptomic and proteomic interactions locally. As a practical objective, cascade-derived features can be used for drug response prediction and survival modeling, as shown by Reitsam et al., allowing AI systems to capture underlying biological dynamics rather than just static measurements [38]. These cascade features can also be extended to tasks like lymph node metastasis prediction or pa- tient risk stratification based on aggregate expression patterns [39]. That said, there is still a gap in integrating full cascade-level information across modalities. While some work has imputed metabolite profiles from RNA, true integration across metabolomics, proteomics, and transcriptomics remains limited [40,41]. The core idea here is that histology can act as the anchor for integrating these signals, enabling cellular-level proteomic cascade prediction directly from tissue phenotype. But realistically, modeling cascades from phenotype back to genotype in digital pathology is still wide open and far from solved.

This study focuses on establishing proof-of-concept within a controlled dataset. External validation across institutions remains an important direction for future work. Given the known domain shift in histopathology, cross-cohort generalization is non-trivial and warrants dedicated study.

5 Limitations

While this study uses the term “cascades” to describe multi-protein prediction, each cascade represents a characterized set of hallmark proteins from canonically observed pathways. Full representation of each cascade would require a substantially larger protein cohort, which would introduce additional noise and increase prediction difficulty. We also note that the model choices used for the patch-level ViT are limited and not fully generalizable, and similar limitations apply to CellViTs. Future studies should derive more generalizable conclusions across omics prediction by conducting broader evaluations of currently available models to assess the advantages of cell-level ViTs on omics tasks. In this study, minimal architectures were used to establish baseline performance for multiprotein prediction.Future work will include comparisons to MIL-based and pathology-optimized architectures.

6 Conclusion

Using cell-level ViTs and patch-level ViTs, we predict the expression of hallmark proteins within cascades and assess the advantages and disadvantages of each modeling approach. Our results show that cascade prediction is a viable task and can provide insight into molec- ular, genetic, and cellular functions and responses. Overall, we consider the performance of CellRPPA modest but promising in comparison to other omics-prediction tasks, though improvement in MAE is needed for viability. We demonstrate that patch-level ViTs do not outperform CellRPPA in microscopically visible cascades and that CellRPPA does not pro- vide an advantage over patch-level models for morphologically ambiguous proteins within the DDR cascade. It can be concluded that CellRPPA does not provide an advantage over patch-level ViTs for cascade prediction when targeting morphologically ambiguous proteins.

Future studies should extend this analysis to additional cascades to better understand the integration of deep learning–derived proteomics into clinical informatics. In addition, cross- cohort analyses are needed to demonstrate generalization across cancer types.

References

1. Tomlinson IANPM. Mutations in normal breast tissue and breast tumours. Breast Cancer Res. 2001;3(5).
- View Article
- Google Scholar
2. Shahbandi A, Nguyen HD, Jackson JG. TP53 mutations and outcomes in breast cancer: reading beyond the headlines. Trends Cancer. 2020;6(2):98–110. pmid:32061310
- View Article
- PubMed/NCBI
- Google Scholar
3. Jenkins ML, Ranga-Prasad H, Parson MAH, Harris NJ, Rathinaswamy MK, Burke JE. Oncogenic mutations of PIK3CA lead to increased membrane recruitment driven by reorientation of the ABD, p85 and C-terminus. Nat Commun. 2023;14(1):181. pmid:36635288
- View Article
- PubMed/NCBI
- Google Scholar
4. Elmore S. Apoptosis: a review of programmed cell death. Toxicol Pathol. 2007;35(4):495–516.
- View Article
- Google Scholar
5. Lossi L. The concept of intrinsic versus extrinsic apoptosis. Biochem J. 2022;479(3):357–84.
- View Article
- Google Scholar
6. Xu G, Shi Y. Apoptosis signaling pathways and lymphocyte homeostasis. Cell Research. 2007;17(9):759–71.
- View Article
- Google Scholar
7. Lomonosova E, Chinnadurai G. BH3-only proteins in apoptosis and beyond: an overview. Oncogene. 2008;27 Suppl 1(Suppl 1):S2-19. pmid:19641503
- View Article
- PubMed/NCBI
- Google Scholar
8. Chaudhary AK. A potential role of x-linked inhibitor of apoptosis protein in mitochondrial membrane permeabilization and its implication in cancer therapy. Drug Discov Today. 2016;21(1):38–47.
- View Article
- Google Scholar
9. Knight MJ, Riffkin CD, Muscat AM, Ashley DM, Hawkins CJ. Analysis of FasL and TRAIL induced apoptosis pathways in glioma cells. Oncogene. 2001;20(41):5789–98. pmid:11593384
- View Article
- PubMed/NCBI
- Google Scholar
10. Grafman J, Salazar AM. Traumatic brain injury. Elsevier; 2015.
11. Li W, et al. Implications of DNA damage response and immunotherapy in tumor therapy. Cell Commun Signal. 2025;23(1).
- View Article
- Google Scholar
12. Abu-Helalah M, et al. BRCA1 and BRCA2 genes mutations among high risk breast cancer patients in Jordan. Scientific Reports. 2020;10(1):17573.
- View Article
- Google Scholar
13. Zheng L, Dai H, Zhou M, Li X, Liu C, Guo Z, et al. Polyploid cells rewire DNA damage response networks to overcome replication stress-induced barriers for tumour progression. Nat Commun. 2012;3:815. pmid:22569363
- View Article
- PubMed/NCBI
- Google Scholar
14. Ueno S, Sudo T, Hirasawa A. ATM: functions of ATM kinase and its relevance to hereditary tumors. Int J Mol Sci. 2022;23(1):523. pmid:35008949
- View Article
- PubMed/NCBI
- Google Scholar
15. Meyer D, et al. Rad51 determines pathway usage in post-replication repair. Nature Commun. 2026.
- View Article
- Google Scholar
16. Vahteristo P, Bartkova J, Eerola H, Syrjäkoski K, Ojala S, Kilpivaara O, et al. A CHEK2 genetic variant contributing to a substantial fraction of familial breast cancer. Am J Hum Genet. 2002;71(2):432–8. pmid:12094328
- View Article
- PubMed/NCBI
- Google Scholar
17. Cybulski C, Górski B, Huzarski T, Masojć B, Mierzejewski M, Debniak T, et al. CHEK2 is a multiorgan cancer susceptibility gene. Am J Hum Genet. 2004;75(6):1131–5. pmid:15492928
- View Article
- PubMed/NCBI
- Google Scholar
18. Shibue T, Takeda K, Oda E, Tanaka H, Murasawa H, Takaoka A, et al. Integral role of Noxa in p53-mediated apoptotic response. Genes Dev. 2003;17(18):2233–8. pmid:12952892
- View Article
- PubMed/NCBI
- Google Scholar
19. Yin X, Zeng D, Liao Y, Tang C, Li Y. The function of H2A histone variants and their roles in diseases. Biomolecules. 2024;14(8):993. pmid:39199381
- View Article
- PubMed/NCBI
- Google Scholar
20. Henninger JE, Young RA. An RNA-centric view of transcription and genome organization. Mol Cell. 2024;84(19):3627–43.
- View Article
- Google Scholar
21. Pizurica M, Zheng Y, Carrillo-Perez F, Noor H, Yao W, Wohlfart C, et al. Digital profiling of gene expression from histology images with linearized attention. Nat Commun. 2024;15(1):9886. pmid:39543087
- View Article
- PubMed/NCBI
- Google Scholar
22. Li Z, Li Y, Xiang J, Wang X, Yang S, Zhang X, et al. AI-enabled virtual spatial proteomics from histopathology for interpretable biomarker discovery in lung cancer. Nat Med. 2026;32(1):231–44. pmid:41491099
- View Article
- PubMed/NCBI
- Google Scholar
23. Moore JB, Weeks ME. Proteomics and systems biology: current and future applications in the nutritional sciences. Adv Nutri. 2011;2(4):355–64.
- View Article
- Google Scholar
24. Jiao P. Prediction of HER2 status based on deep learning in H&E-stained histopathology images of bladder cancer. Biomed. 2024;12(7).
- View Article
- Google Scholar
25. Park J, Shin S, Hwang W, Keum S, Brattoli B, Rawson JH, et al. Deep learning predicts EGFR mutation status from histology images in non-small cell lung cancer. Cancer Res Commun. 2025;5(12):2127–41. pmid:41211715
- View Article
- PubMed/NCBI
- Google Scholar
26. Nadendla EK, Tweedell RE, Kasof G, Kanneganti T-D. Caspases: structural and molecular mechanisms and functions in cell death, innate immunity, and disease. Cell Discov. 2025;11(1):42. pmid:40325022
- View Article
- PubMed/NCBI
- Google Scholar
27. Coarfa C, et al. Reverse-phase protein array: technology, application, data processing, and integration. J Biomol Tech. 2021;32(1):15–29.
- View Article
- Google Scholar
28. Dosovitskiy A. An image is worth 16x16 words: transformers for image recognition at scale. 2020.
- View Article
- Google Scholar
29. Hörst F. CellViT: vision transformers for precise cell segmentation and classification. 2023.
- View Article
- Google Scholar
30. Akbar ARE. CellEcoNet: decoding the cellular language of pathology with deep learning for invasive lung adenocarcinoma recurrence prediction. arXiv preprint. 2025. https://arxiv.org/abs/2508.16742
- View Article
- Google Scholar
31. Wu T. ClusterProfiler 4.0: a universal enrichment tool for interpreting omics data. The Innovation. 2021;2(3):100141.
- View Article
- Google Scholar
32. De Zio D, Cianfanelli V, Cecconi F. New insights into the link between DNA damage and apoptosis. Antioxid Redox Signal. 2013;19(6):559–71. pmid:23025416
- View Article
- PubMed/NCBI
- Google Scholar
33. Yin M, et al. DNA damage response and cancer metastasis: clinical implications and therapeutic opportunities. Exon Publications; 2022.
34. 10x Genomics. Visium spatial gene expression. 2020. https://www.10xgenomics.com/products/visium-spatial-gene-expression
35. RareCyte, Inc. Orion spatial biology platform. RareCyte technical documentation. 2023. https://rarecyte.com/orion/
36. Li B. Gene expression prediction from histology images via hypergraph neural networks. Brief Bioinform. 2024;25(6).
- View Article
- Google Scholar
37. Cranney CW, Meyer JG. Multi-dataset integration and residual connections improve proteome prediction from transcriptomes using deep learning. bioRxiv. 2024.
- View Article
- Google Scholar
38. Wang Q, Chen W-J, Su J, Wang G, Song Q. HECLIP: histology-enhanced contrastive learning for imputation of transcriptomics profiles. Bioinformatics. 2025;41(7):btaf363. pmid:40569046
- View Article
- PubMed/NCBI
- Google Scholar
39. Reitsam NG, Enke JS, Vu Trung K, Märkl B, Kather JN. Artificial intelligence in colorectal cancer: from patient screening over tailoring treatment decisions to identification of novel biomarkers. Digestion. 2024;105(5):331–44. pmid:38865982
- View Article
- PubMed/NCBI
- Google Scholar
40. Reitsam NG, Jiang X, Liang J, Grosser B, Grozdanov V, Loeffler CM, et al. Deep learning-based H&E-derived risk scores in colorectal cancer: associations with tumour morphology, biology, and predicted drug response. J Pathol. 2026;269(1):112–24. pmid:41716034
- View Article
- PubMed/NCBI
- Google Scholar
41. Xie AX, Tansey W, Reznik E. UnitedMet harnesses RNA-metabolite covariation to impute metabolite levels in clinical samples. Nat Cancer. 2025;6(5):892–906. pmid:40251399
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Tomlinson IANPM. Mutations in normal breast tissue and breast tumours. Breast Cancer Res. 2001;3(5).
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Shahbandi A, Nguyen HD, Jackson JG. TP53 mutations and outcomes in breast cancer: reading beyond the headlines. Trends Cancer. 2020;6(2):98–110. pmid:32061310
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Jenkins ML, Ranga-Prasad H, Parson MAH, Harris NJ, Rathinaswamy MK, Burke JE. Oncogenic mutations of PIK3CA lead to increased membrane recruitment driven by reorientation of the ABD, p85 and C-terminus. Nat Commun. 2023;14(1):181. pmid:36635288
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Elmore S. Apoptosis: a review of programmed cell death. Toxicol Pathol. 2007;35(4):495–516.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Lossi L. The concept of intrinsic versus extrinsic apoptosis. Biochem J. 2022;479(3):357–84.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Xu G, Shi Y. Apoptosis signaling pathways and lymphocyte homeostasis. Cell Research. 2007;17(9):759–71.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Lomonosova E, Chinnadurai G. BH3-only proteins in apoptosis and beyond: an overview. Oncogene. 2008;27 Suppl 1(Suppl 1):S2-19. pmid:19641503
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Chaudhary AK. A potential role of x-linked inhibitor of apoptosis protein in mitochondrial membrane permeabilization and its implication in cancer therapy. Drug Discov Today. 2016;21(1):38–47.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref9] 9. Knight MJ, Riffkin CD, Muscat AM, Ashley DM, Hawkins CJ. Analysis of FasL and TRAIL induced apoptosis pathways in glioma cells. Oncogene. 2001;20(41):5789–98. pmid:11593384
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. Grafman J, Salazar AM. Traumatic brain injury. Elsevier; 2015.

[ref11] 11. Li W, et al. Implications of DNA damage response and immunotherapy in tumor therapy. Cell Commun Signal. 2025;23(1).
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref12] 12. Abu-Helalah M, et al. BRCA1 and BRCA2 genes mutations among high risk breast cancer patients in Jordan. Scientific Reports. 2020;10(1):17573.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Zheng L, Dai H, Zhou M, Li X, Liu C, Guo Z, et al. Polyploid cells rewire DNA damage response networks to overcome replication stress-induced barriers for tumour progression. Nat Commun. 2012;3:815. pmid:22569363
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Ueno S, Sudo T, Hirasawa A. ATM: functions of ATM kinase and its relevance to hereditary tumors. Int J Mol Sci. 2022;23(1):523. pmid:35008949
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref15] 15. Meyer D, et al. Rad51 determines pathway usage in post-replication repair. Nature Commun. 2026.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref16] 16. Vahteristo P, Bartkova J, Eerola H, Syrjäkoski K, Ojala S, Kilpivaara O, et al. A CHEK2 genetic variant contributing to a substantial fraction of familial breast cancer. Am J Hum Genet. 2002;71(2):432–8. pmid:12094328
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref17] 17. Cybulski C, Górski B, Huzarski T, Masojć B, Mierzejewski M, Debniak T, et al. CHEK2 is a multiorgan cancer susceptibility gene. Am J Hum Genet. 2004;75(6):1131–5. pmid:15492928
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref18] 18. Shibue T, Takeda K, Oda E, Tanaka H, Murasawa H, Takaoka A, et al. Integral role of Noxa in p53-mediated apoptotic response. Genes Dev. 2003;17(18):2233–8. pmid:12952892
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref19] 19. Yin X, Zeng D, Liao Y, Tang C, Li Y. The function of H2A histone variants and their roles in diseases. Biomolecules. 2024;14(8):993. pmid:39199381
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref20] 20. Henninger JE, Young RA. An RNA-centric view of transcription and genome organization. Mol Cell. 2024;84(19):3627–43.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref21] 21. Pizurica M, Zheng Y, Carrillo-Perez F, Noor H, Yao W, Wohlfart C, et al. Digital profiling of gene expression from histology images with linearized attention. Nat Commun. 2024;15(1):9886. pmid:39543087
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref22] 22. Li Z, Li Y, Xiang J, Wang X, Yang S, Zhang X, et al. AI-enabled virtual spatial proteomics from histopathology for interpretable biomarker discovery in lung cancer. Nat Med. 2026;32(1):231–44. pmid:41491099
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref23] 23. Moore JB, Weeks ME. Proteomics and systems biology: current and future applications in the nutritional sciences. Adv Nutri. 2011;2(4):355–64.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref24] 24. Jiao P. Prediction of HER2 status based on deep learning in H&E-stained histopathology images of bladder cancer. Biomed. 2024;12(7).
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref25] 25. Park J, Shin S, Hwang W, Keum S, Brattoli B, Rawson JH, et al. Deep learning predicts EGFR mutation status from histology images in non-small cell lung cancer. Cancer Res Commun. 2025;5(12):2127–41. pmid:41211715
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref26] 26. Nadendla EK, Tweedell RE, Kasof G, Kanneganti T-D. Caspases: structural and molecular mechanisms and functions in cell death, innate immunity, and disease. Cell Discov. 2025;11(1):42. pmid:40325022
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref27] 27. Coarfa C, et al. Reverse-phase protein array: technology, application, data processing, and integration. J Biomol Tech. 2021;32(1):15–29.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref28] 28. Dosovitskiy A. An image is worth 16x16 words: transformers for image recognition at scale. 2020.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref29] 29. Hörst F. CellViT: vision transformers for precise cell segmentation and classification. 2023.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref30] 30. Akbar ARE. CellEcoNet: decoding the cellular language of pathology with deep learning for invasive lung adenocarcinoma recurrence prediction. arXiv preprint. 2025. https://arxiv.org/abs/2508.16742
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref31] 31. Wu T. ClusterProfiler 4.0: a universal enrichment tool for interpreting omics data. The Innovation. 2021;2(3):100141.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref32] 32. De Zio D, Cianfanelli V, Cecconi F. New insights into the link between DNA damage and apoptosis. Antioxid Redox Signal. 2013;19(6):559–71. pmid:23025416
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref33] 33. Yin M, et al. DNA damage response and cancer metastasis: clinical implications and therapeutic opportunities. Exon Publications; 2022.

[ref34] 34. 10x Genomics. Visium spatial gene expression. 2020. https://www.10xgenomics.com/products/visium-spatial-gene-expression

[ref35] 35. RareCyte, Inc. Orion spatial biology platform. RareCyte technical documentation. 2023. https://rarecyte.com/orion/

[ref36] 36. Li B. Gene expression prediction from histology images via hypergraph neural networks. Brief Bioinform. 2024;25(6).
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref37] 37. Cranney CW, Meyer JG. Multi-dataset integration and residual connections improve proteome prediction from transcriptomes using deep learning. bioRxiv. 2024.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref38] 38. Wang Q, Chen W-J, Su J, Wang G, Song Q. HECLIP: histology-enhanced contrastive learning for imputation of transcriptomics profiles. Bioinformatics. 2025;41(7):btaf363. pmid:40569046
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref39] 39. Reitsam NG, Enke JS, Vu Trung K, Märkl B, Kather JN. Artificial intelligence in colorectal cancer: from patient screening over tailoring treatment decisions to identification of novel biomarkers. Digestion. 2024;105(5):331–44. pmid:38865982
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref40] 40. Reitsam NG, Jiang X, Liang J, Grosser B, Grozdanov V, Loeffler CM, et al. Deep learning-based H&E-derived risk scores in colorectal cancer: associations with tumour morphology, biology, and predicted drug response. J Pathol. 2026;269(1):112–24. pmid:41716034
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref41] 41. Xie AX, Tansey W, Reznik E. UnitedMet harnesses RNA-metabolite covariation to impute metabolite levels in clinical samples. Nat Cancer. 2025;6(5):892–906. pmid:40251399
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

Predicting protein cascade expression from H&E images

Predicting protein cascade expression from H&E images

This is an uncorrected proof.

Figures

Abstract

Author summary

1 Introduction

2 Materials and methods

3 Results

4 Discussion

5 Limitations

6 Conclusion

References