Cell type annotation for scATAC-seq via DNA large language model and graph domain adaptation

doi:10.1371/journal.pcbi.1014226

Cell type annotation for scATAC-seq via DNA large language model and graph domain adaptation

Fig 2

Overview of the proposed scLLMDA framework for cell type annotation from scATAC-seq data.

(A) Feature extraction from genomic sequences: Chromatin-accessible regions are tokenized and encoded using a pretrained DNA language model (DNA-LLM), generating contextualized embeddings that are used to predict accessibility profiles and derive cell embeddings. (B) Cell type annotation via graph domain adaptation: Source and target cell embeddings are used to construct corresponding graphs. A domain adaptation module with attention mechanisms and shared parameters enables effective knowledge transfer and cell type classification across datasets.

doi: https://doi.org/10.1371/journal.pcbi.1014226.g002