Deutsch
 
Benutzerhandbuch Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Hochschulschrift

Improved Multilingual Temporal Tagging with HeidelTime

MPG-Autoren

Ahmad,  Faraz
International Max Planck Research School, MPI for Informatics, Max Planck Society;

/persons/resource/persons180924

Strötgen,  Jannik
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Ahmad, F. (2018). Improved Multilingual Temporal Tagging with HeidelTime. Master Thesis, Universität des Saarlandes, Saarbrücken.


Zitierlink: http://hdl.handle.net/21.11116/0000-0002-B37F-6
Zusammenfassung
One important sub-task of information extraction is that of temporal tagging. Temporal tagging is a two step process that consists of extracting the tempo- ral expressions and normalizing them to a standard ISO date format. This is an important task because temporal information can be utlilised to make robust question answering systems, enrich knowledge bases with temporal information, return better search results that are time-aware, among others. One multilingual and domain-sensitive temporal tagger that is available freely is HeidelTime. It is a rule-based tagger that can tag documents in 13 languages using manually devel- oped resources by language experts; in addition to that, it can also tag documents in over 200 languages using automatically developed resources. It can also tag documents in various domains such as news or narrative type documents. In this thesis, we extend the current HeidelTime multilingual model to create bet- ter automatically devloped resources for over 200 languages, so that the baseline tagging performance of HeidelTime for these, more than 200, languages can be improved. We extend the model in three ways: 1) We improve the automatically developed resources for the morphologically rich languages such as Finnish, Esto- nian, etc. 2) We improve the automatically developed resources for unsegmented languages such as Chinese and Japanese. 3) We improve the automatically devel- oped resources generally for all the languages by enriching language-independent rules with new language-dependent rules that are learned from frequently occur- ring temporal patterns in respective languages. Finally, we present our results of running several evaluations and experiments using available temporally annotated corpora and Wikipedia dumps for various languages, and summarize our findings.