A benchmark of semi-supervised scRNA-seq integration methods in real-world scenarios
Fig 6
Scenario V: Integration with Auto-annotated Labels.
(a) Bar plots showing the performance of all methods across four datasets under this setting. Each bar represents the overall weighted score of a method; triangles and circles indicate the batch correction and biological conservation scores, respectively. The vertical dashed lines divide methods into four groups: those using Azimuth, CellAssign, SingleR, and unsupervised approaches. The five unsupervised methods are shown on the right, represented by unfilled bars. (b) Scatter plot of scaled batch correction scores versus biological conservation scores for each method, averaged across the four applicable datasets. Different colors indicate methods, and point shapes represent the origin of the labels. The scaled score for each dataset and auto-annotated labels is calculated as the ratio of overall bio-conservation/batch-mixing metric for a given method with respect to the corresponding mean using five unsupervised methods. The detailed scaling procedure can be found in Methods Section 2.2. The horizontal red dashed line marks the average biological conservation score across all methods, while the vertical blue dotted line marks the average batch correction score. (c) Radar plots showing the performance of all methods on individual metrics for the human pancreas, lung two species, human immune, and lung atlas datasets, averaged over all annotation types for semi-supervised methods. Metrics include biological conservation (red) and batch correction (blue). As scCRAFT achieved the highest overall performance among unsupervised methods, only its scores are shown for clarity; radar plots for the remaining methods are provided in the Section 3 in S1 Text.