非表示:
キーワード:
-
要旨:
We are addressing an open-ended question answering task
about real-world images. With the help of currently available methods
developed in Computer Vision and Natural Language Processing, we would
like to push an architecture with a global visual representation to its
limits. In our contribution, we show how to achieve competitive
performance on VQA with global visual features (Residual Net) together
with a carefully desgined architecture.