BioFuse: an embedding fusion framework for biomedical foundation models

doi:10.1371/journal.pone.0320989

Fig 1.

BioFuse architecture and workflow.

The upper section illustrates the training process: input samples are preprocessed and fed into multiple foundation models to generate embeddings, which are then concatenated and used to train a classifier. The lower section shows the evaluation process using the same BioFuseModel architecture on a validation set, followed by performance assessment of the trained model.

More »

Expand

Table 1.

Summary of foundation models supported in BioFuse.

More »

Expand

Fig 2.

Visual overview of the MedMNIST + 2D datasets.

Sample images at 224x224 resolution showcase the diverse collection spanning multiple medical imaging modalities, illustrating the wide range of diagnostic tasks represented in the benchmark.

More »

Expand

Table 2.

Overview of MedMNIST + 2D datasets.

More »

Expand

Fig 3.

High-level workflow of BioFuse within a typical machine learning pipeline.

BioFuse serves as a sophisticated embedding generator, accepting training and validation sets as inputs. As outputs, BioFuse provides embeddings for both the training and validation sets, along with a configured BioFuseModel. This BioFuseModel acts as an embedding generator for subsequent use on the test set.

More »

Expand

Table 3.

Performance comparison between BioFuse and best existing methods on MedMNIST+; bold values indicate the best performance for each dataset.

More »

Expand

Fig 4.

Heatmap of test AUC, showing single model performance across MedMNIST+ datasets.

CLIP is included as a baseline general-purpose foundation model for comparison. White box indicates the top-performer while red boxes highlight the 2nd and 3rd top performers for each dataset; More than three models may be highlighted if they share identical values.

More »

Expand

Fig 5.

Heatmap of test accuracy, showing single model performance across MedMNIST+ datasets and ImageNet-1K (top-1).

CLIP is included as a baseline general-purpose foundation model for comparison. White boxes indicate top performers, while red boxes highlight second and third best performers for each dataset. When multiple models achieve identical performance, more than three boxes may be highlighted.

More »

Expand

Table 4.

Performance comparison of biomedical foundation models on ImageNet-1K. Bold values indicate best performance.

More »

Expand

Table 5.

Ablation 1 – Self-attention fusion vs. concatenation.

The “Concat” columns reproduce the BioFuse baseline; the Model combination column lists the backbones selected for the self-attention run only—the corresponding concat combinations are given in Table 3. Δ = (Self-attn – Concat). Backbones printed in bold overlap with the concat baseline.

More »

Expand

Table 6.

Ablation 2 – Oracle-single vs. concatenation.

Concat figures are the main BioFuse results; Oracle-single uses only the best individual model for each dataset. Δ = Oracle − Concat.

More »

Expand

Table 7.

Robustness on MedMNIST‐C (lower is better; raw ratio scale).

More »

Expand

Fig 6.

End-to-end runtime per dataset.

Stacked bars show total wall-clock time (hours) to reproduce each BioFuse experiment. The blue segment indicates embedding extraction for the nine backbones, the green segment the XGBoost evaluation runs, and the orange segment the hyperparameter sweep on Weights & Biases.

More »

Expand