hide
Free keywords:
Computer Science, Computer Vision and Pattern Recognition, cs.CV
Abstract:
Few-shot learning methods operate in low data regimes. The aim is to learn
with few training examples per class. Although significant progress has been
made in few-shot image classification, few-shot video recognition is relatively
unexplored and methods based on 2D CNNs are unable to learn temporal
information. In this work we thus develop a simple 3D CNN baseline, surpassing
existing methods by a large margin. To circumvent the need of labeled examples,
we propose to leverage weakly-labeled videos from a large dataset using tag
retrieval followed by selecting the best clips with visual similarities,
yielding further improvement. Our results saturate current 5-way benchmarks for
few-shot video classification and therefore we propose a new challenging
benchmark involving more classes and a mixture of classes with varying
supervision.