Purely perceptual machines robustly predict human visual arousal, valence, and 
aesthetics

Conwell, Colin; Graham, Daniel; Konkle, Talia; Vessel, Edward Allen

Local TagsRelease HistoryDetailsSummary

Purely perceptual machines robustly predict human visual arousal, valence, and aesthetics

Conwell, C., Graham, D., Konkle, T., & Vessel, E. A. (2022). Purely perceptual machines robustly predict human visual arousal, valence, and aesthetics. Journal of Vision, 22: 4266.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000C-6CDD-3 Version Permalink: https://hdl.handle.net/21.11116/0000-000C-6CDE-2

Genre: Meeting Abstract

Files

show Files

Locators

show

hide

Locator:
https://jov.arvojournals.org/article.aspx?articleid=2784785 (Publisher version) Open Access Gold

Description:
OA

OA-Status:
Gold

Creators

show

hide

Creators:
Conwell, Colin¹, Author
Graham, Daniel², Author
Konkle, Talia¹, Author
Vessel, Edward Allen³, Author

Affiliations:
1Harvard University, ou_persistent22
2Hobart and William Smith Colleges, ou_persistent22
3Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society, ou_2421697

Content

show

hide

Free keywords: -

Abstract: Our experience of a beautiful, moving, or aversive image clearly evokes affective processes beyond vision, but the relative contributions of factors along the spectrum from input (image statistics) to ideation (abstract thought) remain a matter of debate. Machine vision systems, lacking both emotion and higher-order cognitive processes, provide an empirical testbed for isolating the contributions of a purely perceptual representation. How well can we predict human affective responses to an image from the purely perceptual response of a machine? Here, we address this question with a comprehensive survey of deep neural networks (e.g. ConvNets, Transformers, MLP-Mixers) trained on a variety computer vision tasks (e.g. vision-language contrastive learning, segmentation), examining the degree to which they can predict aesthetic judgment, arousal, and valence for images from multiple categories across two distinct datasets. Importantly, we use the features of these pre-trained models without any additional fine-tuning or retraining, probing whether affective information is immediately latent in the structure of the perceptual representation. We find that these networks have features sufficient to linearly predict (even with nonparametric mappings) average ratings of aesthetics, arousal, and valence with remarkably high accuracy across the board – at or near the predictions we would make based on the responses of the most representative ('taste-typical') human subjects. Models trained on object and scene classification, and modern contrastive learning models, produce the best overall features for prediction, while randomly-initialized models yield far lower predictive accuracies. Aesthetic judgments are the most predictable of the affective responses (followed by arousal, then valence), and we can predict these responses with greater accuracy for ‘taste-typical’ subjects than for less ‘taste-typical’ subjects. Taken together, these results suggest that the fundamental locus of visually evoked affective experience may be located more proximately to the perceptual system than abstract cognitive accounts of these experiences might otherwise suggest.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2022-12

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Journal of Vision

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Charlottesville, VA : Scholar One, Inc.

Pages: - Volume / Issue: 22 Sequence Number: 4266 Start / End Page: - Identifier: ISSN: 1534-7362
CoNE: https://pure.mpg.de/cone/journals/resource/111061245811050