Text-image synergy for multimodal retrieval and annotation

Nag Chowdhury, Sreyasi

doi:10.22028/D291-34509

Local TagsRelease HistoryDetailsSummary

Text-image synergy for multimodal retrieval and annotation

Nag Chowdhury, S. (2021). Text-image synergy for multimodal retrieval and annotation. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-34509.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0009-428A-1 Version Permalink: https://hdl.handle.net/21.11116/0000-000C-B74E-F

Genre: Thesis

Files

show Files

Locators

show

hide

Locator:
https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31690 (Any fulltext) Open Access Green

Description:
-

OA-Status:
Green

Creators

show

hide

Creators:
Nag Chowdhury, Sreyasi^{1, 2}, Author
Weikum, Gerhard¹, Referee
de Melo, Gerard¹, Referee
Berberich, Klaus¹, Referee

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551

Content

show

hide

Free keywords: image retrieval image-text alignment image captioning commonsense knowledge

Abstract: Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2021-06-28Published Online: 2021Date issued: 2021

Publication Status: Issued

Pages: 131 p.

Publishing info: Saarbrücken : Universität des Saarlandes

Table of Contents: -

Rev. Type: -

Identifiers: BibTex Citekey: Chowphd2021
DOI: 10.22028/D291-34509
URN: urn:nbn:de:bsz:291--ds-345092
Other: hdl:20.500.11880/31690

Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show