A parametric texture model based on deep convolutional features closely matches 
texture appearance for humans

Wallis, TSA; Funke, CM; Ecker, AS; Gatys, LA; Wichmann, FA; Bethge, M

doi:10.1167/17.10.1081

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Poster

A parametric texture model based on deep convolutional features closely matches texture appearance for humans

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

Link
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Wallis, T., Funke, C., Ecker, A., Gatys, L., Wichmann, F., & Bethge, M. (2017). A parametric texture model based on deep convolutional features closely matches texture appearance for humans. Poster presented at 17th Annual Meeting of the Vision Sciences Society (VSS 2017), St. Pete Beach, FL, USA.

Cite as: https://hdl.handle.net/21.11116/0000-0000-C403-F

Abstract

Much of our visual environment consists of texture—“stuff” like cloth, bark or gravel as distinct from “things” like dresses, trees or paths—and we humans are adept at perceiving textures and their subtle variation. How does our visual system achieve this feat? Here we psychophysically evaluate a new parameteric model of texture appearance (the CNN texture model; Gatys et al., 2015) that is based on the features encoded by a deep
convolutional neural network (deep CNN) trained to recognise objects in images (the VGG-19; Simonyan and Zisserman, 2015). By cumulatively matching the correlations of deep features up to a given layer (using up to five convolutional layers) we were able to evaluate models of increasing complexity. We used a three-alternative spatial oddity task to test whether model-generated textures could be discriminated from original natural textures under two viewing conditions: when test patches were briefly presented to the parafovea (“single fixation”) and when observers were able to make eye movements to all three patches (“inspection”). For 9 of the 12 source textures we tested, the models using more than three layers produced images that were indiscriminable from the originals even
under foveal inspection. The venerable parameteric texture model of Portilla and Simoncelli (Portilla and Simoncelli, 2000) was also able to match the appearance of these textures in the single fixation condition, but not under inspection. Of the three source textures our model could not match, two contain strong periodicities. In a second experiment, we found that matching the power spectrum in addition to the deep features used above (Liu et al., 2016) greatly improved matches for these two textures. These
results suggest that the features learned by deep CNNs encode statistical regularities of natural scenes that capture important aspects of material perception in humans.