Towards matching peripheral appearance for arbitrary natural images using deep 
features

Wallis, T; Funke, C; Ecker, A; Gatys, L; Wichmann, F; Bethge, M

doi:10.1167/17.10.786

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Poster

Towards matching peripheral appearance for arbitrary natural images using deep features

MPG-Autoren

Es sind keine MPG-Autoren in der Publikation vorhanden

Externe Ressourcen

Link
(beliebiger Volltext)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Wallis, T., Funke, C., Ecker, A., Gatys, L., Wichmann, F., & Bethge, M. (2017). Towards matching peripheral appearance for arbitrary natural images using deep features. Poster presented at 17th Annual Meeting of the Vision Sciences Society (VSS 2017), St. Pete Beach, FL, USA.

Zitierlink: https://hdl.handle.net/21.11116/0000-0000-C44B-F

Zusammenfassung

Due to the structure of the primate visual system, large distortions of the input can go unnoticed in the periphery, and objects can be harder to identify. What encoding underlies these effects? Similarly to Freeman
Simoncelli (Nature Neuroscience, 2011), we developed a model that uses summary statistics averaged over spatial regions that increases with retinal eccentricity (assuming central fixation on an image). We also designed the averaging areas such that changing their scaling progressively discards more information from the original image (i.e. a coarser model produces greater distortions to original image structure than a model with higher
resolution). Different from Freeman and Simoncelli, we use the features of a deep neural network trained on object recognition (the VGG-19; Simonyan Zisserman, ICLR 2015), which is state-of-the art in parametric texture synthesis. We tested whether human observers can discriminate model-
generated images from their original source images. Three images subtending 25 deg, two of which were physically identical, were presented for 200 ms each in a three-alternative temporal oddity paradigm. We find a model that, for most original images we tested, produces synthesised
images that cannot be told apart from the originals despite producing significant distortions of image structure. However, some images were readily discriminable. Therefore, the model has successfully encoded necessary but not sufficient information to capture appearance in human scene perception. We explore what image features are correlated with discriminability on the image (which images are harder than others?) and pixel (where in an image is the hardest location?) level. While our model does not produce
“metamers”, it does capture many features important for the appearance of arbitrary natural images in the periphery.