English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

MPS-Authors
/persons/resource/persons180947

Xian,  Yongqin
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons225351

Sharma,  Saurabh
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele,  Bernt
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons127761

Akata,  Zeynep
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10275-10284). Piscataway, NJ: IEEE. doi:10.1109/CVPR.2019.01052.


Cite as: http://hdl.handle.net/21.11116/0000-0003-162E-2
Abstract
When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.