DeepGaze II: Predicting fixations from deep features over time and tasks

Kümmerer, M; Wallis, T; Bethge, M

doi:10.1167/17.10.1147

Local TagsRelease HistoryDetailsSummary

DeepGaze II: Predicting fixations from deep features over time and tasks

Kümmerer, M., Wallis, T., & Bethge, M. (2017). DeepGaze II: Predicting fixations from deep features over time and tasks. Poster presented at 17th Annual Meeting of the Vision Sciences Society (VSS 2017), St. Pete Beach, FL, USA.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0000-C441-9 Version Permalink: https://hdl.handle.net/21.11116/0000-0006-B4B7-0

Genre: Poster

Files

show Files

Locators

show

hide

Locator:
Link (Any fulltext) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Kümmerer, M, Author
Wallis, T, Author
Bethge, M¹, Author

Affiliations:
1External Organizations, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Where humans choose to look can tell us a lot about behaviour in a variety of tasks. Over the last decade numerous models have been proposed to explain fixations when viewing still images. Until recently these models
failed to capture a substantial amount of the explainable mutual information between image content and the fixation locations (Kümmerer et al, PNAS 2015). This limitation can be tackled effectively by using a transfer learning strategy (“DeepGaze I”, Kümmerer et al. ICLR workshop 2015), in which features learned on object recognition are used to predict fixations. Our new model “DeepGaze II” converts an image into the high-dimensional feature space of the VGG network. A simple readout network is then
used to yield a density prediction. The readout network is pre-trained on the SALICON dataset and fine-tuned on the MIT1003 dataset. DeepGaze II explains 82 of the explainable information on held out data and is achieving
top performance on the MIT Saliency Benchmark. The modular
architecture of DeepGaze II allows a number of interesting applications. By retraining on partial data, we show that fixations after 500ms presentation time are driven by qualitatively different features than the first 500ms, and
we can predict on which images these changes will be largest. Additionally we analyse how different viewing tasks (dataset from Koehler et al. 2014) change fixation behaviour and show that we are able to predict the viewing
task from the fixation locations. Finally, we investigate how much fixations are driven by low-level cues versus high-level content: By replacing the VGG features with isotropic mean-luminance-contrast features, we create
a low-level saliency model that outperforms all saliency models before DeepGaze I (including saliency models using DNNs and other high level features). We analyse how the contributions of high-level and low-level features
to fixation locations change over time.

Details

show

hide

Language(s):

Dates: Date issued: 2017-08

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1167/17.10.1147
BibTex Citekey: KummererWB2017

Degree: -

Event

show

hide

Title: 17th Annual Meeting of the Vision Sciences Society (VSS 2017)

Place of Event: St. Pete Beach, FL, USA

Start-/End Date: 2017-05-19 - 2017-05-24

Legal Case

show

Project information

show

Source 1

show

hide

Title: Journal of Vision

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Charlottesville, VA : Scholar One, Inc.

Pages: - Volume / Issue: 17 (10) Sequence Number: - Start / End Page: 1147 Identifier: ISSN: 1534-7362
CoNE: https://pure.mpg.de/cone/journals/resource/111061245811050