English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Improved Multilingual Temporal Tagging with HeidelTime

Ahmad, F. (2018). Improved Multilingual Temporal Tagging with HeidelTime. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is

Files

show Files
hide Files
:
2018_Faraz_Ahmad_MScThesis.pdf (Any fulltext), 373KB
 
File Permalink:
-
Name:
2018_Faraz_Ahmad_MScThesis.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Ahmad, Faraz1, Author
Strötgen, Jannik2, Advisor           
Weikum, Gerhard2, Referee           
Affiliations:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551              
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: One important sub-task of information extraction is that of temporal tagging.
Temporal tagging is a two step process that consists of extracting the tempo-
ral expressions and normalizing them to a standard ISO date format. This is
an important task because temporal information can be utlilised to make robust
question answering systems, enrich knowledge bases with temporal information,
return better search results that are time-aware, among others. One multilingual
and domain-sensitive temporal tagger that is available freely is HeidelTime. It is
a rule-based tagger that can tag documents in 13 languages using manually devel-
oped resources by language experts; in addition to that, it can also tag documents
in over 200 languages using automatically developed resources. It can also tag
documents in various domains such as news or narrative type documents.

In this thesis, we extend the current HeidelTime multilingual model to create bet-
ter automatically devloped resources for over 200 languages, so that the baseline
tagging performance of HeidelTime for these, more than 200, languages can be
improved. We extend the model in three ways: 1) We improve the automatically
developed resources for the morphologically rich languages such as Finnish, Esto-
nian, etc. 2) We improve the automatically developed resources for unsegmented
languages such as Chinese and Japanese. 3) We improve the automatically devel-
oped resources generally for all the languages by enriching language-independent
rules with new language-dependent rules that are learned from frequently occur-
ring temporal patterns in respective languages. Finally, we present our results of
running several evaluations and experiments using available temporally annotated
corpora and Wikipedia dumps for various languages, and summarize our findings.

Details

show
hide
Language(s): eng - English
 Dates: 2018-06-062018
 Publication Status: Issued
 Pages: 84 p.
 Publishing info: Saarbrücken : Universität des Saarlandes
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: AhmadMaster2018
 Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show