English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Purely perceptual machines robustly predict human visual arousal, valence, and aesthetics

Conwell, C., Graham, D., Konkle, T., & Vessel, E. A. (2022). Purely perceptual machines robustly predict human visual arousal, valence, and aesthetics. Journal of Vision, 22: 4266.

Item is

Basic

show hide
Genre: Meeting Abstract

Files

show Files

Locators

show
hide
Description:
OA
OA-Status:
Gold

Creators

show
hide
 Creators:
Conwell, Colin1, Author
Graham, Daniel2, Author
Konkle, Talia1, Author
Vessel, Edward Allen3, Author                 
Affiliations:
1Harvard University, ou_persistent22              
2Hobart and William Smith Colleges, ou_persistent22              
3Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society, ou_2421697              

Content

show
hide
Free keywords: -
 Abstract: Our experience of a beautiful, moving, or aversive image clearly evokes affective processes beyond vision, but the relative contributions of factors along the spectrum from input (image statistics) to ideation (abstract thought) remain a matter of debate. Machine vision systems, lacking both emotion and higher-order cognitive processes, provide an empirical testbed for isolating the contributions of a purely perceptual representation. How well can we predict human affective responses to an image from the purely perceptual response of a machine? Here, we address this question with a comprehensive survey of deep neural networks (e.g. ConvNets, Transformers, MLP-Mixers) trained on a variety computer vision tasks (e.g. vision-language contrastive learning, segmentation), examining the degree to which they can predict aesthetic judgment, arousal, and valence for images from multiple categories across two distinct datasets. Importantly, we use the features of these pre-trained models without any additional fine-tuning or retraining, probing whether affective information is immediately latent in the structure of the perceptual representation. We find that these networks have features sufficient to linearly predict (even with nonparametric mappings) average ratings of aesthetics, arousal, and valence with remarkably high accuracy across the board – at or near the predictions we would make based on the responses of the most representative ('taste-typical') human subjects. Models trained on object and scene classification, and modern contrastive learning models, produce the best overall features for prediction, while randomly-initialized models yield far lower predictive accuracies. Aesthetic judgments are the most predictable of the affective responses (followed by arousal, then valence), and we can predict these responses with greater accuracy for ‘taste-typical’ subjects than for less ‘taste-typical’ subjects. Taken together, these results suggest that the fundamental locus of visually evoked affective experience may be located more proximately to the perceptual system than abstract cognitive accounts of these experiences might otherwise suggest.

Details

show
hide
Language(s): eng - English
 Dates: 2022-12
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: -
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Journal of Vision
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Charlottesville, VA : Scholar One, Inc.
Pages: - Volume / Issue: 22 Sequence Number: 4266 Start / End Page: - Identifier: ISSN: 1534-7362
CoNE: https://pure.mpg.de/cone/journals/resource/111061245811050