Towards matching peripheral appearance for arbitrary natural images using deep 
features

Wallis, T; Funke, C; Ecker, A; Gatys, L; Wichmann, F; Bethge, M

doi:10.1167/17.10.786

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Poster

Towards matching peripheral appearance for arbitrary natural images using deep features

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

Link
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Wallis, T., Funke, C., Ecker, A., Gatys, L., Wichmann, F., & Bethge, M. (2017). Towards matching peripheral appearance for arbitrary natural images using deep features. Poster presented at 17th Annual Meeting of the Vision Sciences Society (VSS 2017), St. Pete Beach, FL, USA.

Cite as: https://hdl.handle.net/21.11116/0000-0000-C44B-F

Abstract

Due to the structure of the primate visual system, large distortions of the input can go unnoticed in the periphery, and objects can be harder to identify. What encoding underlies these effects? Similarly to Freeman
Simoncelli (Nature Neuroscience, 2011), we developed a model that uses summary statistics averaged over spatial regions that increases with retinal eccentricity (assuming central fixation on an image). We also designed the averaging areas such that changing their scaling progressively discards more information from the original image (i.e. a coarser model produces greater distortions to original image structure than a model with higher
resolution). Different from Freeman and Simoncelli, we use the features of a deep neural network trained on object recognition (the VGG-19; Simonyan Zisserman, ICLR 2015), which is state-of-the art in parametric texture synthesis. We tested whether human observers can discriminate model-
generated images from their original source images. Three images subtending 25 deg, two of which were physically identical, were presented for 200 ms each in a three-alternative temporal oddity paradigm. We find a model that, for most original images we tested, produces synthesised
images that cannot be told apart from the originals despite producing significant distortions of image structure. However, some images were readily discriminable. Therefore, the model has successfully encoded necessary but not sufficient information to capture appearance in human scene perception. We explore what image features are correlated with discriminability on the image (which images are harder than others?) and pixel (where in an image is the hardest location?) level. While our model does not produce
“metamers”, it does capture many features important for the appearance of arbitrary natural images in the periphery.