Figure 1.
Deep neural network architecture.
Given an input amino acid sequence, the neural network outputs a posterior distribution over the class labels for that amino acid. This general deep network architecture is suitable for all of our prediction tasks. The network is characterized by three parts: (1) an amino acid feature extraction layer, (2) a sequential feature extraction layer, and (3) a series of classical neural network layers. The first layer consists a PSI-BLAST feature module and an amino acid embedding module. With a sliding window input (here
), the amino acid embedding module outputs a series of real valued vectors
. Similarly, the PSI-BLAST module derives
20-dimensional PSI-BLAST feature vectors corresponding to the
amino acids. These vectors are then concatenated in the sequential extraction layer of the network. Finally, the derived vector is fed into the classical neural network layers. The final softmax layer allows us to interpret the outputs as probabilities for each class.
Figure 2.
Multitask learning with weight sharing between multiple deep neural networks.
In this figure, two related tasks are trained simultaneously using the network the architecture from Figure 1. Here only the very last layers of the network are task specific.
Table 1.
Summary of data sets.
Figure 3.
Network architecture for training the “natural protein” auxiliary task.
The “natural protein” auxiliary task aiming to model the local patterns of amino acids that naturally occur in protein sequences. Using local windows in the unlabeled protein sequences as positive examples and randomly modified windows as negative examples, the network learns the feature representations for each amino acid. In contrast to the network illustrated in Figure 1, the network contains only the amino acid embedding module in the first layer of the network. The learned embedding is encoded into the real valued parameter matrix of the amino acid feature extraction layer.
Table 2.
Comparison of learning strategies based on percent accuracy.
Figure 4.
A learned amino acid embedding.
The figure shows an approximation of a 15-dimensional embedding of amino acids, learned by a neural network trained on the natural protein task. The projection to 2D is accomplished via principal component analysis.