Fig 1.
This figure illustrates the architecture of Poly(A)-DG.
Each pair of inputs includes a raw DNA sequence and its label to show it contains PAS or not. The raw DNA sequences are encoded into the one-hot format and send into CNN and MLP sub-network. The outputs of CNN and MLP are concatenated and used HEX technique to minimize the differences among different species. The output of CNN is directly used in prediction after softmax and the output of HEX is used in loss calculation. The back-propagation algorithm is employed to tune the model with loss.
Table 1.
Source domain: Omni human and BL mouse. Target domain: Rat and bovine.
Table 2.
Source domain: Human and bovine. Target domain: BL mouse and rat.
Table 3.
Source domain: BL mouse and bovine. Target domain: Human and rat.
Table 4.
Source domain: Omni human and rat. Target domain: BL mouse and bovine.
Table 5.
Source domain: Rat and BL mouse. Target domain: Omni human and bovine.
Table 6.
Source domain: Rat and bovine. Target domain: Human and BL mouse.
Fig 2.
The data from different species are in the same scale.
(a)training source domain is mixed by Omni Human and Mouse, target species is Rat. (b) training source domain is mixed by Omni Human and bovine, target species is Rat. (c) training source domain is mixed by Rat and Mouse, target species is Omni Human. (d) training source domain is mixed by bovine and Mouse, target species is Rat.
Fig 3.
Imbalanced source domains: The number of data from the first source species is fixed.
(a)source domain: Omni Human and Mouse, target species: Rat. (b) source domain: Omni Human and bovine, target species: BL Mouse. (c) source domain: BL Mouse and Rat, target species: Omni Human. (d) source domain: BL Mouse and bovine, target species: Rat. (e) source domain: Omni Human and bovine, target species: Rat. (f) source domain: BL Mouse and Rat, target species: bovine.
Fig 4.
Imbalanced source domains: The number of data from the second source species is fixed.
(a)source domain: Omni Human and Mouse, target species: Rat. (b) source domain: Omni Human and bovine, target species: Rat. (c) source domain: BL Mouse and Rat, target species: bovine. (d) source domain: BL Mouse and bovine, target species: Rat. (e) source domain: Omni Human and Rat, target species: BL Mouse. (f) source domain: BL Mouse and bovine, target species: Omni Human.
Table 7.
Imbalanced PAS positive-negative ratio.
Table 8.
Imbalanced PAS positive-negative ratio for different species; 1:2 for the first source species and 1:3 for the second one.
Table 9.
Gene with single PAS V.S. gene with multiple PAS; target domain: Rat.
Fig 5.
Visualization of convolution filters of Poly(A)-DG.
(A)Source domain: BL Mouse and Rat. (B)Source domain: BL Mouse and bovine (C)Source domain: Rat and bovine. (D)Source domain: Omni Human and BL Mouse. (E)Source domain: Omni Human and bovine. (F)Source domain: Omni Human and Rat.
Fig 6.
Visualization of position importance in PAS identification.
(a)Source domain: Omni Human and BL Mouse. (b)Source domain: Omni Human and BL Mouse. (c)Source domain: Omni Human and BL Mouse. (d)Source domain: Omni Human and BL Mouse. (e)Source domain: Omni Human and BL Mouse. (f)Source domain: Omni Human and BL Mouse.
Table 10.
Predict PAS on chromosome 1 of rat; FPR: False positive rate; FNR: False negative rate.
Table 11.
Performance of Poly(A)-DG on standard cross-species PAS dataset and filtered cross-species PAS dataset.