Robust predictors for drug response of patients with acute myeloid leukemia

Bahar Tercan

doi:10.1371/journal.pone.0343422

Abstract

The significant heterogeneity in treatment responses among patients with acute myeloid leukemia (AML) underscores the critical need for accurate drug response prediction. We developed k-Top Scoring Pairs (kTSP) classifiers, ensemble methods that aggregate the relative expression of gene pairs. We compared their accuracy with that of state-of-the-art machine learning methods, linear and radial basis function support vector machines, random forest and elastic net regression classifiers for drug response prediction of patients with AML. Our results demonstrate that kTSP particularly outperforms other methods when the number of sensitive and resistant patients is imbalanced, a common challenge in clinical studies. Our approach is inherently robust to batch effects and uniquely suited for single-patient classification due to its rank-based methodology.

Citation: Tercan B (2026) Robust predictors for drug response of patients with acute myeloid leukemia. PLoS One 21(2): e0343422. https://doi.org/10.1371/journal.pone.0343422

Editor: Francesco Bertolini, European Institute of Oncology, ITALY

Received: October 28, 2025; Accepted: February 5, 2026; Published: February 23, 2026

Copyright: © 2026 Bahar Tercan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Publicly available data generated by others were used in this study. The Beat AML dataset (single-drug response, gene expression, and clinical data) was downloaded from: https://biodev.github.io/BeatAML2/. The venetoclax-containing combination drug response data was obtained from Eide et al. [20] (Table 9). The external cohort data used for single-drug response prediction, the FPMTB dataset, was downloaded from: https://zenodo.org/records/7370747.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Acute myeloid leukemia (AML) is an aggressive hematological cancer characterized by the clonal accumulation of abnormally differentiated, immature myeloid cells (blasts) in the bone marrow and blood, leading to bone marrow failure [1,2]. The 5-year survival rate for AML is 32.9% [3]. Improved treatment options have enabled AML to be cured in 35%–40% of patients younger and 5% – 15% of patients older than 60 years old [1].

Due to high heterogeneity in AML patients and varied treatment responses, molecular data in AML has predominantly been used for predicting prognosis and deciding on post-remission treatment. While genomic profiling is essential for guiding initial therapeutic decisions in certain subsets, notably the use of FLT3 inhibitors (midostaurin/quizartinib) for FLT3-mutated AML [4,5], and Gemtuzumab Ozogamicin (GO) for CD33-positive core-binding factor AML [6,7] — its integration is still limited compared to later stages of therapy. The standard frontline treatment for AML patients who are fit for intensive chemotherapy remains a cytarabine-based (7 + 3) regimen. Furthermore, although the approval of other targeted therapies has been robust, their frontline use as monotherapy or non-intensive combinations (e.g., Venetoclax, IDH inhibitors) is largely restricted to elderly or unfit patients [8].

The Beat AML clinical trial (NCT03013998) provided cytogenetic and mutational data within seven days of sample receipt, allowing for rapid treatment selection. Patients were then assigned to sub-studies based on the dominant clone of their tumors. This molecularly-guided approach resulted in a median overall survival of 12.8 months, significantly longer than the 3.9 months observed in patients treated with the standard of care (induction with cytarabine and daunorubicin [7 + 3 or equivalent] or a hypomethylation agent) [9]. This study demonstrates the importance of precision medicine and paves the way for personalized therapy by providing molecular, clinical, and ex vivo drug response data for individual acute myeloid leukemia (AML) patients.

In this study, we utilize the comprehensive Beat AML dataset [10,11], a multi-phase resource spanning ten years. This dataset includes ex vivo drug sensitivity, clinical annotations, and rich molecular data from hundreds of patients with heterogeneous AML subtypes, providing an ideal foundation for developing drug response predictors.

Previous data-driven studies have used the multi omics data in Beat AML, including selected clinical features, mutations, and gene expression [12,13] to predict patient-specific drug responses. Other studies have used derived attributes from gene expression data, such as cell-type deconvolution [14] and BCL2 signatures [15]. One study is focused on predicting response to particular drugs such as BET inhibitors [16]. Proteomics data from Beat AML patients has been generated and used in drug response classifiers by Pino et al. [17] and Gosline et al. [18].

We employed a robust machine learning approach, the k-Top Scoring Pairs (kTSP) classifier [19], to predict the ex vivo drug response of patient-derived samples from the Beat AML cohort. This method is uniquely suited for clinical translation and is robust to technical variability (batch effects) as it relies solely on the relative expression of a few informative gene pairs. We benchmarked the predictive accuracy of kTSP against state-of-the-art models, including random forest, linear and radial basis function (RBF) support vector machines (SVM) and elastic net regression. Another uniqueness of our study is that it is the first ex vivo drug response prediction work on the venetoclax-containing combination drug response data obtained from Eide et al. [20]. We applied our classifiers on an external AML cohort, Functional Precision Medicine Tumor Board (FPMTB) [21] to evaluate the generalizability of them. We also reported the consistency of rules across different temporal (Beat AML Waves 1 + 2 and Waves 3 + 4) and clinical cohorts de novo or not.

Crucially, our completely data-driven approach is inherently robust: it does not require batch correction or data normalization. This property allows for single-sample classification, enabling clinicians and researchers to classify individual samples simply by checking the majority vote across the classifier gene pairs—a significant practical advantage in a clinical setting. We provide the classifier gene pairs for both single drugs and venetoclax-containing combination drugs as supplementary files.

Materials and methods

Data acquisition

In this study we used the Beat AML dataset [10,11] which was collected over ten years in two phases: waves 1 + 2 and waves 3 + 4. It contains ex vivo drug sensitivity, clinical annotations, and RNA (Agilent SureSelect Strand-Specific RNA Library Preparation Kit on the Bravo robot (Agilent)) and DNA (Illumina Nextera RapidCapture Exome capture probes and protocol) sequencing data from patients with de novo, transformed, and therapy-related AML, as well as patients with relapse. De novo patients are the AML patients with no prior history of Myelodysplastic Syndromes (MDS), Myeloproliferative Neoplasms (MPN), or chemotherapy. Transformed AML evolves from a prior myeloid disorder (MDS or MPN), and therapy related AML occurs after exposure to chemotherapy or radiation for a different disease. Relapse is the recurrence of disease after achieving remission.

To analyze the generalizability of our relative gene expression-based kTSP classifiers, we applied them to an independent cohort of Acute Myeloid Leukemia (AML) patients from the FPMTB [21], which provided functional, genomic, and transcriptomic data.

All the data we used in this is publicly available and was downloaded on July 10, 2024. We did not have access to information that could identify individual participants during or after data collection.

Gene expression normalization

We converted the RPKM normalized gene expression to transcript per million (TPM) for the Beat AML dataset with the Equation 1 [22].

(1)

where i is the index of a gene and N is the number of genes in a sample.

We converted the gene counts from FPMTB cohorts into TPM using the convertCounts function from DGEobj.utils R library (version 1.0.6). We kept only protein-coding genes and used log2 -normalized TPM matrices in the classification tasks.

k-Top scoring pairs classifier

For classification, we utilized the kTSP algorithm [19], an ensemble extension of the Top Scoring Pair (TSP) algorithm [23]. The TSP algorithm operates as a simple binary “rule” based on the relative expression of two genes (geneA≶geneB). The kTSP classifier aggregates these rules from distinct TSPs, making final predictions through unweighted majority voting (the default setting). This rank-based approach is interpretable, robust to batch effects, and invariant to any monotonic transformation of the data. We implemented kTSP using the Bioconductor switchBox package (version 1.45.0) [24,25]. The optimal number of pairs, k, was selected using the analysis of variance approach [26] during the training phase, ensuring no data leakage from the test data set. Given our aim to maintain a simple and clinically translatable classifier, we restricted the search range for k to 1–15 gene pairs for all single-drug and combination-drug response predictions.

Drug sensitivity quantitation and thresholding

In the Beat AML ex vivo drug response profiling, drug response was quantified using probit-modeled Area Under the Curve (AUC) values (theoretical range 0–300). For each compound, a seven-point concentration series (typically 10 μM to 0.0137 μM) was log10-transformed, and a probit regression curve was fitted to the viability data using maximum-likelihood estimation for slope and intercept [11,20,27]. The AUC was normalized such that a value of 100 represents a non-responsive profile (no change in viability relative to controls). Consequently, AUC < 100 was defined as ‘Sensitive,’ indicating a drug-induced reduction in cell viability, otherwise the sensitivity call was defined as ‘Resistant.’ Values exceeding 100 means the cells actually grew more than the control, the curve fit shifted upward possibly due to experimental noise or drug enhanced growth. This thresholding strategy is consistent with established Beat AML standards [17,18] and reflects the inverse correlation between AUC and drug potency.

For the external cohort analysis, we utilized the Selective Drug Sensitivity Scores (sDSS) from the FPMTB dataset. Selective drug-sensitivity scoring enables normalization of the individual patient’s responses against normal cell responses [28,29]. DSS is a metric dependent on the AUC. It is effectively a normalized version of AUC where and are the maximum and minimum concentrations at which the drug was screened. The AUC over the dose range where the responses exceed a user-specified minimum activity level () can be calculated either using analytical or numerical integration. Equation 2 shows the formula for

(2)

To penalize the compounds that are effective at higher tested concentrations only, the summary score is further normalized by the logarithm of the top asymptote () of the estimated dose-response model that corresponds to the maximal estimated response of the drug (Equation 3):

(3)

To prioritize drugs that demonstrate efficacy across a broad therapeutic window, rather than those active only at the highest concentrations, the scoring algorithm was refined (Equation 4):

(4)

With each version of the DSS-score, the selective drug sensitivity score (sDSS) is calculated by subtracting the average of the control DSSs from the patient DSS (Equation 5).

(5)

Benchmarking classifiers

To benchmark the performance of the kTSP approach, we employed several state-of-the-art machine learning models: SVM with linear and RBF kernels [30–32], random forest (an ensemble tree-based approach) [33], and elastic net regression (which combines Lasso and Ridge regularizations) [34]. SVMs utilize the kernel trick to transform nonlinearly separable data into a linearly separable space. For all classifiers outside of kTSP, model training and tuning were performed using the caret R package (version 7.0.1).

Statistical analysis and model evaluation

For the SVM, random forest, and elastic net regression classifiers, we optimized parameters exclusively on the Beat AML Waves 1 + 2 training data using a 5-fold cross-validation (CV) scheme with the caret R package’s random search method. For SVM and elastic net, z-score normalization was applied to the training data. Crucially, no data from the test dataset (Beat AML Waves 3 + 4) was used in this parameter search.

To address the common challenge of class imbalance (sensitive vs. resistant samples), we employed the Synthetic Minority Over-sampling Technique (SMOTE) [35] data sampling option during the random search process.

Model performance was assessed using several metrics: balanced accuracy (the arithmetic mean of sensitivity and specificity), the Area Under the ROC Curve (AUROC), sensitivity, and specificity. All analyses were conducted using the R programming language (version 4.5.0).

For external cohort validation, we trained the classifiers on Beat AML 1–4 cohorts and applied them to the FPMTB cohort. The 95th percentile of the sDSS distribution of all drugs measured for all samples was used as a threshold for sensitivity calling for the FPMTB cohort as it is suggested in the paper where the FPMTB dataset was provided.

S2 Fig shows the flowchart of the analyses performed in this manuscript.

Results

Predicting drug responses of Beat AML samples

In our first analysis, we trained our models, kTSP and comparator state of the art machine learning models using the Beat AML waves 1 + 2 cohort and tested them on the Beat AML waves 3 + 4 cohort. To ensure robust training and validation of the models, we filtered the drug response data to include only those agents with sufficient sample size. Specifically, only drugs with at least 20 sensitive and 20 resistant samples in the training set (Beat AML Waves 1 + 2) and at least 10 sensitive and 10 resistant samples in the test set (Beat AML Waves 3 + 4) were retained for analysis. This approach was applied separately for single drugs and venetoclax-containing combination drugs.

Fig 1 illustrates the distribution of sample counts for both the training and testing cohorts across the included single drugs and combinations.

Download:

Fig 1. Response to single and venetoclax containing combination drugs.

(A) The number of sensitive and resistant samples for each single drug in Beat AML Waves 1 + 2 and Waves 3 + 4 cohorts for single drugs (B) and for venetoclax containing combination drugs (for combination drug response prediction, drug response from Eide et al [20] and gene expression from Beat AML dataset were used). Area Under Drug Response Curve (AUC)=100 was used as a cut off for sensitivity calling for both single drugs and drug combinations. Note: Only drugs with at least 20 sensitive and 20 resistant samples in the training set (Waves 1 + 2) and at least 10 sensitive and 10 resistant samples in the testing set (Waves 3 + 4) were included in the analysis.

https://doi.org/10.1371/journal.pone.0343422.g001

We compared the predictive accuracy of kTSP against SVM (linear and RBF kernels), random forest, and elastic net regression (Fig 2). We also examined the effect of addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE) during training. In terms of AUROC, balancing the training data did not improve results for any algorithm. However, all comparator algorithms (excluding kTSP) benefited from balancing in terms of balanced accuracy, suggesting improved success in classifying the minority class. The minority-class refers to a class within a dataset that contains significantly fewer instances compared to other classes.

Download:

Fig 2. Comparative accuracy of kTSP, random forest, linear and RBF kernel SVM and elastic net regression classifiers in terms of sensitivity, specificity, balanced accuracy and Area Under the ROC Curve (AUROC) for single and venetoclax containing combination drug response prediction when the training data is balanced (with SMOTE method) and not balanced (A) For individual drug (B stands for Balanced and NB stands for Not Balanced in y-axis).

(B) All drugs together.

https://doi.org/10.1371/journal.pone.0343422.g002

Fig 2 details the comparative performance of all classifiers for individual drugs and performance of classifiers for all drugs together, respectively.

The kTSP algorithm consistently outperformed the other classifiers based on balanced accuracy and accuracy within the minority class. This superiority was evident across single-drug responses (where sensitive samples were typically the minority) and combination responses (where resistant samples were sometimes the minority). This high performance (Fig 2) was achieved despite kTSP’s inherent simplicity, using an average of 10.57 ± 3.38 rules for single drugs and 10.5 ± 3.43 rules for combination drugs (based on at most 30 genes).

Overall, the high prevalence of imbalanced drug responses in AML samples led to the failure of other classifiers to correctly classify the minority class, even after applying SMOTE. The ability of kTSP to maintain high balanced accuracy in the face of imbalance highlights its utility for real-world clinical prediction.

Testing generalizability on an external cohort, FPMTB

We validated our findings on an independent external cohort, FPMTB. To assess potential batch effects and cohort similarity, we visualized all three datasets (Beat AML Waves 1 + 2, Waves 3 + 4, and FPMTB) based on the first two principal components (Fig 3).

Download:

Fig 3. Visualization of the three cohorts, Beat AML waves 1 + 2, Beat AML waves 3 + 4 and Functional Precision Medicine Tumor Board (FPMTB), based on the first two principal components.

Beat AML dataset, Waves 1 + 2 (371 samples), Waves 3 + 4 (184 samples) has been processed with the same library, SureSelect Strand-Specific RNA Library (Agilent), while the FPMTB (163 samples) dataset which have been processed with different libraries, i.e., NextEra and Scriptseq.

https://doi.org/10.1371/journal.pone.0343422.g003

Fig 3 shows that the first principal component clearly separates Beat AML cohorts and FPMTB cohort and it indicates that there is a strong batch effect between the two studies which overpowers the biological similarities. In addition to that the FPMTB cluster is more loosely dispersed, suggesting higher variability or heterogeneity which could be caused by different libraries that were used in the gene expression data generation within FPMTB cohort, i.e., NextEra, ScriptSeq compared to the Beat AML group which was processed in the same library, SureSelect Strand-Specific RNA Library (Agilent). This underscores the necessity of rank-based approaches that bypass the need for cross-platform normalization.

We observed in the independent FPMTB validation cohort that the selective drug sensitivity scores (sDSS) of samples predicted by kTSP to be sensitive were generally higher than those predicted to be resistant. Specifically, this difference reached statistical significance (P < 0.05) for 6 out of the 14 tested compounds, including Venetoclax and Sorafenib. Fig 4 shows the sDSS values for the samples grouped by their kTSP-predicted sensitivity calls (high sDSS indicates sensitivity to a drug while high AUC suggests resistance).

Download:

Fig 4. Selective drug sensitivity scores (sDSSs) of the FPMTB validation samples that are predicted to be resistant and sensitive by the kTSP classifier.

P values were computed with one sided Wilcoxon test followed by Benjamini-Hochberg multiple test correction. The analysis includes only those drugs for which the kTSP classifier was trained on at least 20 sensitive and 20 resistant samples in the Beat AML dataset.

https://doi.org/10.1371/journal.pone.0343422.g004

This significant separation validates the ability of the kTSP classifiers to accurately distinguish patient drug sensitivity in an independent cohort using a distinct (though directionally inverse) drug sensitivity metric, sDSS. The robust performance on this external dataset confirms the strong generalizability of the relative gene expression-based kTSP approach across different AML cohorts.

We also examined the classification accuracy of kTSP and the other classifiers on the FPMTB cohort. To compare the accuracy of kTSP with the state of art classifiers, we repeated the accuracy analysis by training the models on entire BeatAML data and testing on the FPMTB cohort (Fig 5). We included the drugs that are common to BeatAML and FPMTB, which has at least 20 sensitive and 20 resistant samples in Beat AML 1–4 and at least 10 sensitive and 10 resistant samples in the FPMTB cohort. The 95th percentile of the sDSS distribution of all drugs measured for all samples was used as a threshold for sensitivity call for the FPMTB cohort as it is suggested by the paper we got the FPMTB dataset from [21].

Download:

Fig 5. The comparative accuracy of kTSP and other classifiers on the independent FPMTB validation cohort.

The drugs that are common to BeatAML and FPMTB, had at least 20 sensitive and 20 resistant samples in Beat AML 1-4 and at least 10 sensitive and 10 resistant samples in the FPMTB cohort.

https://doi.org/10.1371/journal.pone.0343422.g005

The results on the external FPMTB cohort were consistent with the accuracy analysis performed on the Beat AML cohorts. The state of the art classifiers tended to assign all samples to one of the classes in most cases.

The most predictive genes

From the drug response classifiers that were trained on the Beat AML 1–4 cohort with at least 20 sensitive and 20 resistant samples (Tables in S1, S2 Tables), 83.5% of classifier genes happen to exist in only one of the classifiers, mostly because of the parsimonious nature of feature sets of kTSP classifiers. The most recurrent genes in different classifiers could be considered the ones that are most strongly associated with in vitro drug response. There are two genes that exist in 9 out of 74 classifiers. These genes are leukocyte immunoglobulin-like receptor, LILRB1 and membrane-associated ring finger (C3HC4) 1), MARCH1. There are two genes which exist in 8 of the classifiers,cysteinyl leukotriene receptor 2, CYSLTR2 and poly(rC) binding protein 3, PCBP3. Then followed by 5 genes that existed in 6 classifiers. Focusing on these 4 genes, LILRB1’s overexpression is correlated with sensitivity to 8 drugs and its underexpression is correlated with sensitivity to 1 drug. This gene plays a role in monocytic differentiation [36] and it is a monocytic AML marker [37]. MARCH1’s overexpression is correlated with sensitivity to 9 drugs and it promotes the proliferation of AML cells and inhibits apoptosis and differentiation [38]. CYSLTR2’s overexpression is correlated with sensitivity to 8 drugs and it is a receptor for the inflammatory mediators cysteinyl leukotrienes and overexpressed in FLT3-ITD- and NPM1-mutated AML samples [39]. PCBP3’s under expression is correlated with sensitivity to 8 drugs and its high expression is relatable to favorable survival [40].

Consistency of kTSP rules across clinically distinct cohorts

To investigate the consistency and generalizability of the learned kTSP rules independent of predictive accuracy, we analyzed how the optimal rules, derived from models trained on the entire Beat AML dataset (waves 1–4), applied to specific patient subgroups.

We defined a rule compliance metric as the sum of votes: a rule (geneA>geneB) applied to a sample contributed +1 vote, and if not, it counted as −1. This sum reflects the degree to which the learned rule set maintains directional consistency within a subgroup.

We performed two key comparisons:

Temporal Consistency: We compared the rule compliance between the temporal training and testing cohorts (Waves 1 + 2 versus Waves 3 + 4) (S3A Fig).
Disease Origin Consistency: We compared the rule compliance between de novo AML and not de novo AML (S3B Fig).

As S3 Fig demonstrates, there is high consistency of the kTSP rules across these different patient stratifications, validating the robustness of the relative expression signature.

Discussion

In this study, we analyzed the predictive power of relative gene expression for drug response prediction in Acute Myeloid Leukemia (AML). Our approach utilized the kTSP classifier, training a robust model with a small, interpretable signature of no more than 15 gene pairs. Crucially, kTSP does not rely on actual gene expression measurements but only on the relative expression ordering (geneA>geneB), making it inherently robust to batch effects.

As demonstrated by the significant batch effect in our PCA (Fig 3), absolute expression values vary drastically across platforms. kTSP’s reliance on relative ordering bypasses this variance, explaining its superior generalizability to the FPMTB cohort.

The robustness of kTSP has significant clinical implications. It can help prioritize treatment for AML patients regardless of the expression data platform used, provided the relative expressions of the classifier gene pairs remain consistent. Even if a few gene pair orders change, the classifier remains useful because the final sensitivity call depends on the unweighted majority vote of all rules. This feature eliminates the need for complex batch correction or alignment of a test cohort to the training dataset, making it uniquely suited for single-patient classification. This enables real-time clinical decision-making the moment a single patient’s transcriptomic data is available. Furthermore, the kTSP classifier requires measuring only a small, fixed number of genes, making it compatible with high-throughput, cost-effective solutions like quantitative PCR (qPCR). This approach can replace ex vivo experiments in health facilities where it is not feasible to perform them. This is a significant practical advantage, as our results show it achieves this simplicity while maintaining accuracy comparable to or higher than complex classifiers like SVM, elastic net, and random forest.

Novelty and interpretability

The kTSP classifier has previously been applied to tasks like tumor type classification [41] and prognosis prediction [25]. To the best of our knowledge, this study represents the first application of kTSP to ex vivo drug response prediction using patient-derived AML samples. Our completely data-driven approach proved robust to both imbalanced class sizes and batch effects. We anticipate that the classifier rules provided in Table in S1 Table for single drug and Table in S2 Table for venetoclax containing combination drugs can be readily adopted in clinics or research centers where ex vivo drug screening is infeasible, as prediction requires only checking the simple majority voting logic, eliminating the need for complex software tools.

Limitations and future directions

A key limitation of this study is that the models were trained using patient-derived ex vivo drug responses, meaning the predictions may not always perfectly align with a patient’s in vivo (clinical) response to treatment. Another limitation stems from the binary classification: samples with intermediate drug responses are assigned to either the resistant or sensitive label. This discretization process of the AUC values (using the AUC = 100 cutoff) may cause a loss of precision, although our cutoff aligns with the standards used in the Beat AML studies. Future work could address this by exploring the multiclass classification extension of the algorithm, multiclassPairs [42], which can classify samples as responsive, resistant, and intermediate. Future studies should also compare performance against thresholds optimized specifically for clinical endpoints.

Conclusion

We believe this study provides a useful contribution to the literature, particularly for classification tasks characterized by imbalanced class distributions. We successfully validated our kTSP classifiers on an external dataset without performing any batch correction. For many drugs, including venetoclax and sorafenib, we observed that the sDSS scores were significantly higher in kTSP-classified sensitive samples compared to resistant samples. We specifically highlighted venetoclax and sorafenib as they represent distinct classes of targeted therapies—BCL-2 inhibitors and multi-kinase inhibitors, respectively—that are clinically pivotal in the treatment of hematologic malignancies. Venetoclax has revolutionized AML treatment for older patients who cannot handle standard chemotherapy. Sorafenib is frequently used for AML patients with specific mutations (like FLT3-ITD). Finally, we verified the strong consistency of the rules across both the temporal split of the Beat AML data (Waves 1 + 2) vs. (Waves 3 + 4) and clinically relevant disease origins (de novo vs. not de novo). The high consistency of rules across de novo and not de novo AML cases (S3 Fig) suggests that the underlying gene-pair relationships capture fundamental mechanisms of drug sensitivity that transcend the patient’s clinical history.

Supporting information

S1 Fig. (A) Area Under the Drug Response Curve (AUC) values for drugs in the Beat AML Waves 1 + 2 (training) and Waves 3 + 4 (testing) cohorts used for single-drug response prediction.

(B) and for venetoclax containing combination drugs (C) Distribution of ex vivo drug response (AUC) for all samples in the Eide et al. cohort independent of being utilized in this manuscript.

https://doi.org/10.1371/journal.pone.0343422.s001

(PDF)

S2 Fig. The relationship between the datasets and analyses performed in this manuscript.

(A) The overall schema (B) The flowchart of the analyses.

https://doi.org/10.1371/journal.pone.0343422.s002

(PDF)

S3 Fig. Consistency of rules among different patient groups.

Sum of votes (y axes) is the sum of rules that apply/don’t apply, a rule (geneA>geneB) applied to a sample contributed +1 vote, and if not, it counted as −1. This sum reflects the degree to which the learned rule set maintains directional consistency within a subgroup. (A) The sum of rules that apply/don’t apply for each Beat AML cohort (Waves 1 + 2 vs. Waves 3 + 4) (B) The sum of rules that apply/don’t apply for the patients that are de novo or not. We included the drugs that have at least 20 samples for each group shown.

https://doi.org/10.1371/journal.pone.0343422.s003

(PDF)

S1 Table. The classifier rules for single drugs.

https://doi.org/10.1371/journal.pone.0343422.s004

(XLSX)

S2 Table. The classifier rules for venetoclax containing combination drugs.

https://doi.org/10.1371/journal.pone.0343422.s005

(XLSX)

Acknowledgments

The author would like to thank Mauro A. A. Castro for the insightful discussions. Google Gemini (version 1.5 Pro) was used as a linguistic tool to refine the manuscript’s text. Specifically, the tool was utilized for correcting grammatical errors, improving sentence structure, and enhancing overall clarity and flow. All scientific content, including the research methodology, data analysis, and conclusions, was developed and written by the author, who is solely responsible for the accuracy and integrity of the final manuscript.

References

1. Döhner H, Weisdorf DJ, Bloomfield CD. Acute Myeloid Leukemia. N Engl J Med. 2015;373(12):1136–52.
- View Article
- Google Scholar
2. Khwaja A, Bjorkholm M, Gale RE, Levine RL, Jordan CT, Ehninger G, et al. Acute myeloid leukaemia. Nat Rev Dis Primers. 2016;2:16010. pmid:27159408
- View Article
- PubMed/NCBI
- Google Scholar
3. Acute Myeloid Leukemia - Cancer Stat Facts. SEER. https://seer.cancer.gov/statfacts/html/amyl.html 2026 January 3.
4. Stone RM, Mandrekar SJ, Sanford BL, Laumann K, Geyer S, Bloomfield CD, et al. Midostaurin plus Chemotherapy for Acute Myeloid Leukemia with a FLT3 Mutation. N Engl J Med. 2017;377(5):454–64. pmid:28644114
- View Article
- PubMed/NCBI
- Google Scholar
5. Uy GL, Mandrekar SJ, Laumann K, Marcucci G, Zhao W, Levis MJ, et al. A phase 2 study incorporating sorafenib into the chemotherapy for older adults with FLT3-mutated acute myeloid leukemia: CALGB 11001. Blood Adv. 2017;1(5):331–40. pmid:29034366
- View Article
- PubMed/NCBI
- Google Scholar
6. Jen EY, Ko C-W, Lee JE, Del Valle PL, Aydanian A, Jewell C, et al. FDA Approval: Gemtuzumab Ozogamicin for the Treatment of Adults with Newly Diagnosed CD33-Positive Acute Myeloid Leukemia. Clin Cancer Res. 2018;24(14):3242–6. pmid:29476018
- View Article
- PubMed/NCBI
- Google Scholar
7. Gbadamosi M, Meshinchi S, Lamba JK. Gemtuzumab ozogamicin for treatment of newly diagnosed CD33-positive acute myeloid leukemia. Future Oncol. 2018;14(30):3199–213. pmid:30039981
- View Article
- PubMed/NCBI
- Google Scholar
8. Fleischmann M, Schnetzke U, Hochhaus A, Scholl S. Management of Acute Myeloid Leukemia: Current Treatment Options and Future Perspectives. Cancers (Basel). 2021;13(22):5722. pmid:34830877
- View Article
- PubMed/NCBI
- Google Scholar
9. Burd A, Levine RL, Ruppert AS, Mims AS, Borate U, Stein EM, et al. Precision medicine treatment in acute myeloid leukemia using prospective genomic profiling: feasibility and preliminary efficacy of the Beat AML Master Trial. Nat Med. 2020;26(12):1852–8. pmid:33106665
- View Article
- PubMed/NCBI
- Google Scholar
10. Bottomly D, Long N, Schultz AR, Kurtz SE, Tognon CE, Johnson K, et al. Integrative analysis of drug response and clinical outcome in acute myeloid leukemia. Cancer Cell. 2022;40(8):850-864.e9. pmid:35868306
- View Article
- PubMed/NCBI
- Google Scholar
11. Tyner JW, Tognon CE, Bottomly D, Wilmot B, Kurtz SE, Savage SL, et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 2018;562(7728):526–31. pmid:30333627
- View Article
- PubMed/NCBI
- Google Scholar
12. Karathanasis N, Papasavva PL, Oulas A, Spyrou GM. Combining clinical and molecular data for personalized treatment in acute myeloid leukemia: A machine learning approach. Comput Methods Programs Biomed. 2024;257:108432. pmid:39316958
- View Article
- PubMed/NCBI
- Google Scholar
13. Trac QT, Pawitan Y, Mou T, Erkers T, Östling P, Bohlin A, et al. Prediction model for drug response of acute myeloid leukemia patients. NPJ Precis Oncol. 2023;7(1):32. pmid:36964195
- View Article
- PubMed/NCBI
- Google Scholar
14. Karakaslar EO, Severens JF, Sánchez-López E, van Veelen PA, Zlei M, van Dongen JJM, et al. A transcriptomic based deconvolution framework for assessing differentiation stages and drug responses of AML. NPJ Precis Oncol. 2024;8(1):105. pmid:38762545
- View Article
- PubMed/NCBI
- Google Scholar
15. Lee C, Lee S, Park E, Hong J, Shin D-Y, Byun JM, et al. Transcriptional signatures of the BCL2 family for individualized acute myeloid leukaemia treatment. Genome Med. 2022;14(1):111. pmid:36171613
- View Article
- PubMed/NCBI
- Google Scholar
16. Drusbosky LM, Vidva R, Gera S, Lakshminarayana AV, Shyamasundar VP, Agrawal AK, et al. Predicting response to BET inhibitors using computational modeling: A BEAT AML project study. Leuk Res. 2019;77:42–50. pmid:30642575
- View Article
- PubMed/NCBI
- Google Scholar
17. Pino JC, Posso C, Joshi SK, Nestor M, Moon J, Hansen JR, et al. Mapping the proteogenomic landscape enables prediction of drug response in acute myeloid leukemia. Cell Rep Med. 2024;5(1):101359. pmid:38232702
- View Article
- PubMed/NCBI
- Google Scholar
18. Gosline SJC, Tognon C, Nestor M, Joshi S, Modak R, Damnernsawad A, et al. Proteomic and phosphoproteomic measurements enhance ability to predict ex vivo drug response in AML. Clin Proteomics. 2022;19(1):30. pmid:35896960
- View Article
- PubMed/NCBI
- Google Scholar
19. Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21(20):3896–904. pmid:16105897
- View Article
- PubMed/NCBI
- Google Scholar
20. Eide CA, Kurtz SE, Kaempf A, Long N, Joshi SK, Nechiporuk T, et al. Clinical Correlates of Venetoclax-Based Combination Sensitivities to Augment Acute Myeloid Leukemia Therapy. Blood Cancer Discov. 2023;4(6):452–67. pmid:37698624
- View Article
- PubMed/NCBI
- Google Scholar
21. Malani D, Kumar A, Brück O, Kontro M, Yadav B, Hellesøy M, et al. Implementing a Functional Precision Medicine Tumor Board for Acute Myeloid Leukemia. Cancer Discov. 2022;12(2):388–401. pmid:34789538
- View Article
- PubMed/NCBI
- Google Scholar
22. Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020;26(8):903–9. pmid:32284352
- View Article
- PubMed/NCBI
- Google Scholar
23. Geman D, d’Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3:Article19. pmid:16646797
- View Article
- PubMed/NCBI
- Google Scholar
24. Afsari B, Fertig EJ, Geman D, Marchionni L. switchBox: an R package for k-Top Scoring Pairs classifier development. Bioinformatics. 2015;31(2):273–4. pmid:25262153
- View Article
- PubMed/NCBI
- Google Scholar
25. Marchionni L, Afsari B, Geman D, Leek JT. A simple and reproducible breast cancer prognostic test. BMC Genomics. 2013;14:336. pmid:23682826
- View Article
- PubMed/NCBI
- Google Scholar
26. Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. 2014. https://hdl.handle.net/1969.1/184776
27. Kurtz SE, Eide CA, Kaempf A, Khanna V, Savage SL, Rofelty A, et al. Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. Proc Natl Acad Sci U S A. 2017;114(36):E7554–63. pmid:28784769
- View Article
- PubMed/NCBI
- Google Scholar
28. Chen Y, He L, Ianevski A, Ayuda-Durán P, Potdar S, Saarela J, et al. Robust scoring of selective drug responses for patient-tailored therapy selection. Nat Protoc. 2024;19(1):60–82. pmid:37996540
- View Article
- PubMed/NCBI
- Google Scholar
29. Yadav B, Pemovska T, Szwajda A, Kulesskiy E, Kontro M, Karjalainen R, et al. Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies. Sci Rep. 2014;4:5193. pmid:24898935
- View Article
- PubMed/NCBI
- Google Scholar
30. Awad M, Khanna R. Support Vector Machines for Classification. Efficient Learning Machines. Apress. 2015. 39–66.
- View Article
- Google Scholar
31. Xu Y, Zomer S, Brereton RG. Support Vector Machines: A Recent Method for Classification in Chemometrics. Critical Reviews in Analytical Chemistry. 2006;36(3–4):177–88.
- View Article
- Google Scholar
32. Vapnik VNA. The nature of statistical learning theory. New York: Springer. 2010.
33. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
- View Article
- Google Scholar
34. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
- View Article
- Google Scholar
35. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
- View Article
- Google Scholar
36. Hodges A, Dubuque R, Chen S-H, Pan P-Y. The LILRB family in hematologic malignancies: prognostic associations, mechanistic considerations, and therapeutic implications. Biomark Res. 2024;12(1):159. pmid:39696628
- View Article
- PubMed/NCBI
- Google Scholar
37. Churchill HRO, Fuda FS, Xu J, Deng M, Zhang CC, An Z, et al. Leukocyte immunoglobulin-like receptor B1 and B4 (LILRB1 and LILRB4): Highly sensitive and specific markers of acute myeloid leukemia with monocytic differentiation. Cytometry B Clin Cytom. 2021;100(4):476–87. pmid:32918786
- View Article
- PubMed/NCBI
- Google Scholar
38. Liu J, Xu J, Sun R, Wang X, Chen F, Fu Y, et al. MARCH1, transcriptionally regulated by POU2F2, facilitates acute myeloid leukemia progression via inducing MYCT1 degradation. Oncogene. 2025;44(33):2983–96. pmid:40533483
- View Article
- PubMed/NCBI
- Google Scholar
39. Maiga A, Lemieux S, Pabst C, Lavallée V-P, Bouvier M, Sauvageau G, et al. Transcriptome analysis of G protein-coupled receptors in distinct genetic subgroups of acute myeloid leukemia: identification of potential disease-specific targets. Blood Cancer J. 2016;6(6):e431. pmid:27258612
- View Article
- PubMed/NCBI
- Google Scholar
40. Zhang B, Yang L, Wang X, Fu D. Identification of survival-related alternative splicing signatures in acute myeloid leukemia. Biosci Rep. 2021;41(7):BSR20204037. pmid:34212178
- View Article
- PubMed/NCBI
- Google Scholar
41. Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, et al. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci U S A. 2007;104(9):3414–9. pmid:17360660
- View Article
- PubMed/NCBI
- Google Scholar
42. Marzouka N-A-D, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics. 2021;37(18):3043–4. pmid:33543757
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Döhner H, Weisdorf DJ, Bloomfield CD. Acute Myeloid Leukemia. N Engl J Med. 2015;373(12):1136–52.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Khwaja A, Bjorkholm M, Gale RE, Levine RL, Jordan CT, Ehninger G, et al. Acute myeloid leukaemia. Nat Rev Dis Primers. 2016;2:16010. pmid:27159408
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Acute Myeloid Leukemia - Cancer Stat Facts. SEER. https://seer.cancer.gov/statfacts/html/amyl.html 2026 January 3.

[ref4] 4. Stone RM, Mandrekar SJ, Sanford BL, Laumann K, Geyer S, Bloomfield CD, et al. Midostaurin plus Chemotherapy for Acute Myeloid Leukemia with a FLT3 Mutation. N Engl J Med. 2017;377(5):454–64. pmid:28644114
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref5] 5. Uy GL, Mandrekar SJ, Laumann K, Marcucci G, Zhao W, Levis MJ, et al. A phase 2 study incorporating sorafenib into the chemotherapy for older adults with FLT3-mutated acute myeloid leukemia: CALGB 11001. Blood Adv. 2017;1(5):331–40. pmid:29034366
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Jen EY, Ko C-W, Lee JE, Del Valle PL, Aydanian A, Jewell C, et al. FDA Approval: Gemtuzumab Ozogamicin for the Treatment of Adults with Newly Diagnosed CD33-Positive Acute Myeloid Leukemia. Clin Cancer Res. 2018;24(14):3242–6. pmid:29476018
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref7] 7. Gbadamosi M, Meshinchi S, Lamba JK. Gemtuzumab ozogamicin for treatment of newly diagnosed CD33-positive acute myeloid leukemia. Future Oncol. 2018;14(30):3199–213. pmid:30039981
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Fleischmann M, Schnetzke U, Hochhaus A, Scholl S. Management of Acute Myeloid Leukemia: Current Treatment Options and Future Perspectives. Cancers (Basel). 2021;13(22):5722. pmid:34830877
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Burd A, Levine RL, Ruppert AS, Mims AS, Borate U, Stein EM, et al. Precision medicine treatment in acute myeloid leukemia using prospective genomic profiling: feasibility and preliminary efficacy of the Beat AML Master Trial. Nat Med. 2020;26(12):1852–8. pmid:33106665
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref10] 10. Bottomly D, Long N, Schultz AR, Kurtz SE, Tognon CE, Johnson K, et al. Integrative analysis of drug response and clinical outcome in acute myeloid leukemia. Cancer Cell. 2022;40(8):850-864.e9. pmid:35868306
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref11] 11. Tyner JW, Tognon CE, Bottomly D, Wilmot B, Kurtz SE, Savage SL, et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 2018;562(7728):526–31. pmid:30333627
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref12] 12. Karathanasis N, Papasavva PL, Oulas A, Spyrou GM. Combining clinical and molecular data for personalized treatment in acute myeloid leukemia: A machine learning approach. Comput Methods Programs Biomed. 2024;257:108432. pmid:39316958
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Trac QT, Pawitan Y, Mou T, Erkers T, Östling P, Bohlin A, et al. Prediction model for drug response of acute myeloid leukemia patients. NPJ Precis Oncol. 2023;7(1):32. pmid:36964195
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Karakaslar EO, Severens JF, Sánchez-López E, van Veelen PA, Zlei M, van Dongen JJM, et al. A transcriptomic based deconvolution framework for assessing differentiation stages and drug responses of AML. NPJ Precis Oncol. 2024;8(1):105. pmid:38762545
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref15] 15. Lee C, Lee S, Park E, Hong J, Shin D-Y, Byun JM, et al. Transcriptional signatures of the BCL2 family for individualized acute myeloid leukaemia treatment. Genome Med. 2022;14(1):111. pmid:36171613
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref16] 16. Drusbosky LM, Vidva R, Gera S, Lakshminarayana AV, Shyamasundar VP, Agrawal AK, et al. Predicting response to BET inhibitors using computational modeling: A BEAT AML project study. Leuk Res. 2019;77:42–50. pmid:30642575
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref17] 17. Pino JC, Posso C, Joshi SK, Nestor M, Moon J, Hansen JR, et al. Mapping the proteogenomic landscape enables prediction of drug response in acute myeloid leukemia. Cell Rep Med. 2024;5(1):101359. pmid:38232702
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref18] 18. Gosline SJC, Tognon C, Nestor M, Joshi S, Modak R, Damnernsawad A, et al. Proteomic and phosphoproteomic measurements enhance ability to predict ex vivo drug response in AML. Clin Proteomics. 2022;19(1):30. pmid:35896960
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref19] 19. Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21(20):3896–904. pmid:16105897
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref20] 20. Eide CA, Kurtz SE, Kaempf A, Long N, Joshi SK, Nechiporuk T, et al. Clinical Correlates of Venetoclax-Based Combination Sensitivities to Augment Acute Myeloid Leukemia Therapy. Blood Cancer Discov. 2023;4(6):452–67. pmid:37698624
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref21] 21. Malani D, Kumar A, Brück O, Kontro M, Yadav B, Hellesøy M, et al. Implementing a Functional Precision Medicine Tumor Board for Acute Myeloid Leukemia. Cancer Discov. 2022;12(2):388–401. pmid:34789538
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref22] 22. Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020;26(8):903–9. pmid:32284352
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref23] 23. Geman D, d’Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3:Article19. pmid:16646797
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref24] 24. Afsari B, Fertig EJ, Geman D, Marchionni L. switchBox: an R package for k-Top Scoring Pairs classifier development. Bioinformatics. 2015;31(2):273–4. pmid:25262153
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref25] 25. Marchionni L, Afsari B, Geman D, Leek JT. A simple and reproducible breast cancer prognostic test. BMC Genomics. 2013;14:336. pmid:23682826
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref26] 26. Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. 2014. https://hdl.handle.net/1969.1/184776

[ref27] 27. Kurtz SE, Eide CA, Kaempf A, Khanna V, Savage SL, Rofelty A, et al. Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. Proc Natl Acad Sci U S A. 2017;114(36):E7554–63. pmid:28784769
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref28] 28. Chen Y, He L, Ianevski A, Ayuda-Durán P, Potdar S, Saarela J, et al. Robust scoring of selective drug responses for patient-tailored therapy selection. Nat Protoc. 2024;19(1):60–82. pmid:37996540
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Yadav B, Pemovska T, Szwajda A, Kulesskiy E, Kontro M, Karjalainen R, et al. Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies. Sci Rep. 2014;4:5193. pmid:24898935
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref30] 30. Awad M, Khanna R. Support Vector Machines for Classification. Efficient Learning Machines. Apress. 2015. 39–66.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref31] 31. Xu Y, Zomer S, Brereton RG. Support Vector Machines: A Recent Method for Classification in Chemometrics. Critical Reviews in Analytical Chemistry. 2006;36(3–4):177–88.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref32] 32. Vapnik VNA. The nature of statistical learning theory. New York: Springer. 2010.

[ref33] 33. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref34] 34. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref35] 35. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref36] 36. Hodges A, Dubuque R, Chen S-H, Pan P-Y. The LILRB family in hematologic malignancies: prognostic associations, mechanistic considerations, and therapeutic implications. Biomark Res. 2024;12(1):159. pmid:39696628
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref37] 37. Churchill HRO, Fuda FS, Xu J, Deng M, Zhang CC, An Z, et al. Leukocyte immunoglobulin-like receptor B1 and B4 (LILRB1 and LILRB4): Highly sensitive and specific markers of acute myeloid leukemia with monocytic differentiation. Cytometry B Clin Cytom. 2021;100(4):476–87. pmid:32918786
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref38] 38. Liu J, Xu J, Sun R, Wang X, Chen F, Fu Y, et al. MARCH1, transcriptionally regulated by POU2F2, facilitates acute myeloid leukemia progression via inducing MYCT1 degradation. Oncogene. 2025;44(33):2983–96. pmid:40533483
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref39] 39. Maiga A, Lemieux S, Pabst C, Lavallée V-P, Bouvier M, Sauvageau G, et al. Transcriptome analysis of G protein-coupled receptors in distinct genetic subgroups of acute myeloid leukemia: identification of potential disease-specific targets. Blood Cancer J. 2016;6(6):e431. pmid:27258612
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref40] 40. Zhang B, Yang L, Wang X, Fu D. Identification of survival-related alternative splicing signatures in acute myeloid leukemia. Biosci Rep. 2021;41(7):BSR20204037. pmid:34212178
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref41] 41. Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, et al. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci U S A. 2007;104(9):3414–9. pmid:17360660
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref42] 42. Marzouka N-A-D, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics. 2021;37(18):3043–4. pmid:33543757
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Data acquisition

Gene expression normalization

k-Top scoring pairs classifier

Drug sensitivity quantitation and thresholding

Benchmarking classifiers

Statistical analysis and model evaluation

Results

Predicting drug responses of Beat AML samples

Testing generalizability on an external cohort, FPMTB

The most predictive genes

Consistency of kTSP rules across clinically distinct cohorts

Discussion

Novelty and interpretability

Limitations and future directions

Conclusion

Supporting information

S1 Fig. (A) Area Under the Drug Response Curve (AUC) values for drugs in the Beat AML Waves 1 + 2 (training) and Waves 3 + 4 (testing) cohorts used for single-drug response prediction.

S2 Fig. The relationship between the datasets and analyses performed in this manuscript.

S3 Fig. Consistency of rules among different patient groups.

S1 Table. The classifier rules for single drugs.

S2 Table. The classifier rules for venetoclax containing combination drugs.

Acknowledgments

References