Improved Multilingual Temporal Tagging with HeidelTime

Ahmad, Faraz

DetailsSummary

Improved Multilingual Temporal Tagging with HeidelTime

Ahmad, F. (2018). Improved Multilingual Temporal Tagging with HeidelTime. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-B37F-6 Version Permalink: https://hdl.handle.net/21.11116/0000-0002-B380-2

Genre: Thesis

Files

show Files

hide Files

:

2018_Faraz_Ahmad_MScThesis.pdf (Any fulltext), 373KB

File Permalink:
-

Name:
2018_Faraz_Ahmad_MScThesis.pdf

Description:
-

OA-Status:

Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )

MIME-Type / Checksum:
application/pdf

Technical Metadata:

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Ahmad, Faraz¹, Author
Strötgen, Jannik², Advisor
Weikum, Gerhard², Referee

Affiliations:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

show

hide

Free keywords: -

Abstract: One important sub-task of information extraction is that of temporal tagging.
Temporal tagging is a two step process that consists of extracting the tempo-
ral expressions and normalizing them to a standard ISO date format. This is
an important task because temporal information can be utlilised to make robust
question answering systems, enrich knowledge bases with temporal information,
return better search results that are time-aware, among others. One multilingual
and domain-sensitive temporal tagger that is available freely is HeidelTime. It is
a rule-based tagger that can tag documents in 13 languages using manually devel-
oped resources by language experts; in addition to that, it can also tag documents
in over 200 languages using automatically developed resources. It can also tag
documents in various domains such as news or narrative type documents.

In this thesis, we extend the current HeidelTime multilingual model to create bet-
ter automatically devloped resources for over 200 languages, so that the baseline
tagging performance of HeidelTime for these, more than 200, languages can be
improved. We extend the model in three ways: 1) We improve the automatically
developed resources for the morphologically rich languages such as Finnish, Esto-
nian, etc. 2) We improve the automatically developed resources for unsegmented
languages such as Chinese and Japanese. 3) We improve the automatically devel-
oped resources generally for all the languages by enriching language-independent
rules with new language-dependent rules that are learned from frequently occur-
ring temporal patterns in respective languages. Finally, we present our results of
running several evaluations and experiments using available temporally annotated
corpora and Wikipedia dumps for various languages, and summarize our findings.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2018-06-06Date issued: 2018

Publication Status: Issued

Pages: 84 p.

Publishing info: Saarbrücken : Universität des Saarlandes

Table of Contents: -

Rev. Type: -

Identifiers: BibTex Citekey: AhmadMaster2018

Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show