Deep convolutional and conditional neural networks for large-scale genomic data generation

doi:10.1371/journal.pcbi.1011584

Deep convolutional and conditional neural networks for large-scale genomic data generation

Fig 2

Illustration of the learning and sampling of a large sequence using a “classical” and a conditional RBM (CRBM).

Initially, we train RBM1 (left) and RBM2 (right) in parallel. Both RBMs are essentially trained in a similar manner: random inputs are drawn and k MC steps are performed before computing the gradient and updating the weights using gradient descent. The difference for the CRBM (RBM2) is that half of the variables in the visible layer are pinned (crossed squares) to the real data during training while the rest is generated conditionally on these pinned variables. After training both machines, we can sample a complete new sequence. To do so, we start from random input and perform k MC steps to generate the first part of the sequence (light yellow-red) using RBM1. Then, we use half of this generated sequence (light red) as the pinned visible variables of the RBM2 (crossed squares) and initialise the rest as random input. We perform k MC steps on RBM2 while keeping the pinned variables fixed to generate the rest of the sequence (light blue). The letters next to arrows show the order of this sampling procedure.

doi: https://doi.org/10.1371/journal.pcbi.1011584.g002