hide
Free keywords:
-
Abstract:
Natural image generation is currently one of the most actively explored fields in Deep Learning. A surprising recent result has been that feature representations from networks trained on a purely discriminative task can be used for state-of-the-art image synthesis (Gatys et al., 2015). However, it is still unclear what aspects of the pre-trained network are critical for high generative performance. It could be, for example, the architecture of the convolutional neural network (CNN) in terms of the number of layers, specific pooling techniques, the connection between filter complexity and filter scale (larger filters are more non-linear), the training task and the network’s performance on that task or the data it was trained on.
To explore the importance of learnt filters and deep architectures, we here consider the task of synthesising natural textures using only a single-layer CNN with completely random filters. Our surprising finding is that we can synthesise natural textures of high perceptual quality that sometimes even rival current state-of-the-art methods (Gatys et al., 2015; Liu et al., 2016) which rely on deep, supervisedly trained multi-layer representations. We hence conclude that neither the supervised training nor the depth of the architecture is indispensable for natural texture generation.
Furthermore, we evaluate the importance of other architectural aspects of random CNNs for natural texture synthesis. For that we introduce a new quantitative measure of texture quality based on the state-of-the-art parametric texture model by Gatys et al. This measure allows us to objectively quantify the performance of each architecture and perform a large-scale grid-search over CNNs with random filters and different architectures (in terms of numbers of layers, sizes of convolutional filters, non-linearities, pooling layers, numbers of feature maps within each layer). The main result is that larger filters and more layers help synthesising textures that are perceptually more similar to the original one, however, at the cost of less variability.