DeepGaze II: Explaining nearly all information in image-based saliency using 
features trained on object detection

Kümmerer, M; Wallis, TSA; Bethge, M

doi:10.12751/nncn.bc2016.0132

Local TagsRelease HistoryDetailsSummary

DeepGaze II: Explaining nearly all information in image-based saliency using features trained on object detection

Kümmerer, M., Wallis, T., & Bethge, M. (2016). DeepGaze II: Explaining nearly all information in image-based saliency using features trained on object detection. Poster presented at Bernstein Conference 2016, Berlin, Germany.

Item is Released

show all

Basic

hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0000-7B02-4 Version Permalink: https://hdl.handle.net/21.11116/0000-0006-CAF3-4

Genre: Poster

Files

show Files

Locators

hide

Locator:
Link (Any fulltext) Open Access status unknown

Description:
-

OA-Status:

Creators

hide

Creators:
Kümmerer, M, Author
Wallis, TSA, Author
Bethge, M¹, Author

Affiliations:
1External Organizations, ou_persistent22

Content

hide

Free keywords: -

Abstract: When free-viewing scenes, the first few fixations of human observers are driven in part by bottom-up attention. Over the last decade a large number of models have been proposed to explain these fixations. One problem the field is facing is that the different metrics used to evaluate model performance produce very different rankings for the models. We recently standardized model comparison using an information-theoretic framework and found that existing models captured at most 1/3 of the explainable mutual information between image content and the fixation locations, which might be partially due to the limited data available [1]. Subsequently, we tried to tackle this limitation using a transfer learning strategy. Our model "DeepGaze I" uses a neural network (AlexNet, [2]) that was originally trained for object detection on the ImageNet dataset. It achieved a large improvement over the previous state of the art, explaining 56 of the explainable information [3] (Figure 1c). In the meantime, a new generation of object recognition models have since been developed, substantially outperforming AlexNet. The success of “DeepGaze I” and similar models suggests that features that yield good object detection performance can be exploited for better saliency prediction, and that object detection and fixation prediction performances are correlated. Here we test this hypothesis. Our new model "DeepGaze II" uses the VGG network [4] to convert an image into a high dimensional representation, which is then fed through a second, smaller network to yield a density prediction. The second network is pre-trained using maximum-likelihood on the SALICON dataset and fine-tuned on the MIT1003 dataset. Remarkably, DeepGaze II explains 83 of the explainable information on held out data (Figure 1c), and has since achieved top performance on the MIT Saliency Benchmark. The problem of predicting spatial fixation densities under free-viewing conditions could be solved very soon.
What makes DeepGaze predictions different? Models before DeepGaze were not only close in performance but also very similar in their predictions, clustering mostly around a simple mean-contrast-luminance model (MLC, Figure 1d). Prediction performance over time shows that DeepGaze II is especially successful at explaining fixations in the first 600ms (Figure 1e). The fact that fixation prediction performance is closely tied to object detection informs theories of attentional selection in scene viewing.

Details

hide

Language(s):

Dates: Date issued: 2016-09-22

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.12751/nncn.bc2016.0132
BibTex Citekey: KummererWB2016

Degree: -

Event

hide

Title: Bernstein Conference 2016

Place of Event: Berlin, Germany

Start-/End Date: 2016-09-21 - 2016-09-23

Legal Case

show

Project information

show

Source 1

hide

Title: Bernstein Conference 2016

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 141 - 142 Identifier: -