English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Modeling brain responses to video stimuli using multimodal video transformers

MPS-Authors
/persons/resource/persons282734

Dong,  Tianai
International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society;
Multimodal Language Department, MPI for Psycholinguistics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Dong_etal_2023_CCN 2023.pdf
(Publisher version), 187KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Dong, T., & Toneva, M. (2023). Modeling brain responses to video stimuli using multimodal video transformers. In Proceedings of the Conference on Cognitive Computational Neuroscience (CCN 2023) (pp. 194-197).


Cite as: https://hdl.handle.net/21.11116/0000-000F-DE62-9
Abstract
Prior work has shown that internal representations of artificial neural networks can significantly predict brain responses elicited by unimodal stimuli (i.e. reading a book chapter or viewing static images). However, the computational modeling of brain representations of naturalistic video stimuli, such as movies or TV shows, still remains underexplored. In this work, we present a promising approach for modeling vision-language brain representations of video stimuli by a transformer-based model that represents videos jointly through audio, text, and vision. We show that the joint representations of vision and text information are better aligned with brain representations of subjects watching a popular TV show. We further show that the incorporation of visual information improves brain alignment across several regions that support language processing.