日本語
 
Help Privacy Policy ポリシー/免責事項
  詳細検索ブラウズ

アイテム詳細

  Generation and Grounding of Natural Language Descriptions for Visual Data

Rohrbach, A. (2017). Generation and Grounding of Natural Language Descriptions for Visual Data. PhD Thesis, Universität des Saarlandes, Saarbrücken.

Item is

基本情報

非表示:
資料種別: 学位論文

ファイル

表示: ファイル

関連URL

非表示:
URL:
http://scidok.sulb.uni-saarland.de/volltexte/2017/6874/ (全文テキスト(全般))
説明:
-
OA-Status:
Green
説明:
-
OA-Status:
Not specified

作成者

非表示:
 作成者:
Rohrbach, Anna1, 2, 著者           
Schiele, Bernt1, 学位論文主査                 
Demberg, Vera3, 監修者
Darrell, Trevor3, 監修者
所属:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547              
2International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551              
3External Organizations, ou_persistent22              

内容説明

非表示:
キーワード: -
 要旨: Generating natural language descriptions for visual data links computer vision and computational linguistics. Being able to generate a concise and human-readable description of a video is a step towards visual understanding. At the same time, grounding natural language in visual data provides disambiguation for the linguistic concepts, necessary for many applications. This thesis focuses on both directions and tackles three specific problems. First, we develop recognition approaches to understand video of complex cooking activities. We propose an approach to generate coherent multi-sentence descriptions for our videos. Furthermore, we tackle the new task of describing videos at variable level of detail. Second, we present a large-scale dataset of movies and aligned professional descriptions. We propose an approach, which learns from videos and sentences to describe movie clips relying on robust recognition of visual semantic concepts. Third, we propose an approach to ground textual phrases in images with little or no localization supervision, which we further improve by introducing Multimodal Compact Bilinear Pooling for combining language and vision representations. Finally, we jointly address the task of describing videos and grounding the described people. To summarize, this thesis advances the state-of-the-art in automatic video description and visual grounding and also contributes large datasets for studying the intersection of computer vision and computational linguistics.

資料詳細

非表示:
言語: eng - English
 日付: 2017-05-162017-06-02
 出版の状態: オンラインで出版済み
 ページ: X, 215 p.
 出版情報: Saarbrücken : Universität des Saarlandes
 目次: -
 査読: -
 識別子(DOI, ISBNなど): BibTex参照ID: Rohrbachphd17
DOI: 10.22028/D291-26708
URN: urn:nbn:de:bsz:291-scidok-68749
その他: hdl:20.500.11880/26764
 学位: 博士号 (PhD)

関連イベント

表示:

訴訟

表示:

Project information

表示:

出版物

表示: