Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction
Fig 2
A schematic view of the encoding of an sgRNA-DNA sequence pair, as employed in the study of Lin et al. [37].
A seven-bit encoding example is illustrated, where the _ symbol indicates the position of DNA or RNA bulges. Each sgRNA-DNA sequence pair is encoded as a fixed-length matrix with seven rows, comprising a five-bit character channel (A, G, C, T, _) and a two-bit direction channel. The five-bit channel encodes the nucleotides at the on- and off-target sites, while the direction channel identifies the locations of mismatches and indels. L denotes the sequence length (L=23 in our study).