scHilda: Hierarchical Integration of LLM with KG database for single cell type annotation

doi:10.1371/journal.pcbi.1014291

Fig 1.

The scHilda framework, divided into two parts.

Part 1 (Knowledge Graph Construction): Defines the KG structure, containing three node types and four directed relationships. Part 2 (Hierarchical Annotation): Illustrates the core three-stage workflow. Major type proposal: The LLM proposes 3 probable major type candidates (e.g., T cell, B cell, NK Cell) based on input markers (e.g., CD3D, CD4, CD8A, etc.). Major type determining: The system performs a global KG query on the 3 candidates. The LLM integrates the retrieved evidence to determine the single major type (e.g., B cell). Subtype annotation: The system performs a local search within the confirmed major type’s (e.g., B cell) subgraph domain. The LLM then uses this focused information to annotate the final subtype (e.g., Naive B cell).

More »

Expand

Fig 2.

scHilda Performance Evaluation and Ablation Studies.

(A) Comparison of annotation agreement scores between scHilda and various existing methods across eight benchmark datasets, showcasing its SOTA performance. (B) Ablation study of different relationship searching types in the knowledge graph, verifying the signiﬁcant contributions of pathway (PARTICIPATES_IN) and co-expression (COEXPRESSED_IN) relationships to annotation accuracy and the negative impact of the Marker relationship on the model. (C) Performance on LLMs of different scales, indicating that the scHilda framework effectively ensures a performance baseline, allowing lightweight models to achieve results close to top-tier models. (D) Comparison of model performance with and without explainability output (reasoning and evidence), showing that disabling this feature can reduce costs with almost no impact on accuracy. (E) The major type distribution in Top-3 candidates, showing the necessity of the major type determination. (F) The impact of different prompt strategies (LLM-biased, KG-biased, neutral) on performance, demonstrating that both LLM-biased and KG-biased approaches show a signiﬁcant drop in performance compared to the neutral one.

More »

Expand

Table 1.

Statistical Significance and Effect Sizes of scHilda vs. Baseline Methods.

More »

Expand

Fig 3.

scHilda Robustness and Hyperparameter Sensitivity Analysis.

(A) Performance on mixed-cell sample annotation tasks and the differences between with the prompting and without, demonstrating the strong robustness brought by the hierarchical strategy. (B) The impact of inputting different numbers of major cell type candidates (1–5) on annotation performance, showing that 3 candidates provide the best balance between accuracy and cost. (C) The impact of inputting different numbers of marker genes (3–10) on annotation performance, showing that the presence of the knowledge graph enhances the model’s robustness when fewer marker genes are available. (D) The impact of inputting different large numbers of marker genes (5–50) on annotation performance, showing that within a certain range, larger numbers of marker genes lead to better performance.

More »

Expand