Text-image synergy for multimodal retrieval and annotation

Chowdhury, Sreyasi Nag

doi:10.22028/D291-34509

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_3343037_3

DetailsSummary

Released

Thesis

Text-image synergy for multimodal retrieval and annotation

MPS-Authors

/persons/resource/persons123412

Chowdhury, Sreyasi Nag
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31690
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Chowdhury, S. N. (2021). Text-image synergy for multimodal retrieval and annotation. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-34509.

Cite as: https://hdl.handle.net/21.11116/0000-0009-428A-1

Abstract

Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.