Lucid Data Dreaming for Multiple Object Tracking

Khoreva, Anna; Benenson, Rodrigo; Ilg, Eddy; Brox, Thomas; Schiele, Bernt

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2418086_4

DetailsÜbersicht

Freigegeben

Forschungspapier

Lucid Data Dreaming for Multiple Object Tracking

MPG-Autoren

/persons/resource/persons79309

Khoreva, Anna
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

1703.09554v2
(Preprint), 12MB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Khoreva, A., Benenson, R., Ilg, E., Brox, T., & Schiele, B. (2017). Lucid Data Dreaming for Multiple Object Tracking. Retrieved from http://arxiv.org/abs/1703.09554.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-002C-E621-0

Zusammenfassung

Convolutional networks reach top quality in pixel-level object tracking but require a large amount of training data (1k ~ 10k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x ~ 100x less annotated data than competing methods. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the tracking task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the object tracking task.