Unsupervised Feature Learning Improves Prediction of Human Brain Activity in Response to Natural Images
The encoding model predicts single voxel responses to images by nonlinearly transforming the images to complex cell responses and linearly transforming the complex cell responses to the single voxel responses. For example, the encoding model predicts a voxel response to a 128128 image as follows: Each of the 16 non-overlapping 3232 patches of the image is first vectorized, preprocessed and linearly transformed to 625 simple cell responses, i.e. where is a vectorized and preprocessed patch. Energies of the simple cells that are in each of the 625 partially overlapping 55 neighborhoods are then locally pooled, i.e. , and nonlinearly transformed to one complex cell response, i.e. . Next, 10000 complex cell responses are linearly transformed to the voxel response, i.e. where . The feature transformations are learned from unlabeled data. The voxel transformations are learned from feature-transformed stimulus-response pairs.