English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Language learning using speech to image retrieval

Merkx, D., Frank, S., & Ernestus, M. (2019). Language learning using speech to image retrieval. In Proceedings of Interspeech 2019 (pp. 1841-1845). doi:10.21437/Interspeech.2019-3067.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
merkx19_interspeech.pdf (Publisher version), 302KB
Name:
merkx19_interspeech.pdf
Description:
-
OA-Status:
Not specified
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Merkx, Danny1, 2, Author           
Frank, Stefan, Author
Ernestus, Mirjam1, Author           
Affiliations:
1Center for Language Studies , External Organizations, ou_55238              
2International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, NL, ou_1119545              

Content

show
hide
Free keywords: -
 Abstract: Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition.

Details

show
hide
Language(s): eng - English
 Dates: 20192019-09
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.21437/Interspeech.2019-3067
 Degree: -

Event

show
hide
Title: Interspeech 2019 : 20th Annual Conference of the International Speech Communication Association
Place of Event: Graz, Austria
Start-/End Date: 2019-09-15 - 2019-09-19

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of Interspeech 2019
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 1841 - 1845 Identifier: -