Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze 
Pooling

Sattar, Hosnieh; Bulling, Andreas; Fritz, Mario

doi:10.1109/ICCVW.2017.322

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

MPG-Autoren

/persons/resource/persons79468

Sattar, Hosnieh
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons86799

Bulling, Andreas
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons44451

Fritz, Mario
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Sattar, H., Bulling, A., & Fritz, M. (2017). Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling. In 2017 IEEE International Conference on Computer Vision Workshops (pp. 2740-2748). Piscataway, NJ: IEEE. doi:10.1109/ICCVW.2017.322.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-002C-1094-8

Zusammenfassung

Previous work focused on predicting visual search targets from human fixations but, in the real world, a specific target is often not known, e.g. when searching for a present for a friend. In this work we instead study the problem of predicting the mental picture, i.e. only an abstract idea instead of a specific target. This task is significantly more challenging given that mental pictures of the same target category can vary widely depending on personal biases, and given that characteristic target attributes can often not be verbalised explicitly. We instead propose to use gaze information as implicit information on users' mental picture and present a novel gaze pooling layer to seamlessly integrate semantic and localized fixation information into a deep image representation. We show that we can robustly predict both the mental picture's category as well as attributes on a novel dataset containing fixation data of 14 users searching for targets on a subset of the DeepFahion dataset. Our results have important implications for future search interfaces and suggest deep gaze pooling as a general-purpose approach for gaze-supported computer vision systems.