Exploring Portability of Data Programming Paradigm

Rakhmanberdieva, Nurzat

Lokale TagsFreigabegeschichteDetailsÜbersicht

Exploring Portability of Data Programming Paradigm

Rakhmanberdieva, N. (2017). Exploring Portability of Data Programming Paradigm. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0002-8441-F Versions-Permalink: https://hdl.handle.net/21.11116/0000-0002-8442-E

Genre: Hochschulschrift

Dateien

einblenden: Dateien

ausblenden: Dateien

:

2017_MSc Thesis Rakhmanberdieva, Nurzat.pdf (beliebiger Volltext), 2MB

Datei-Permalink:
-

Name:
2017_MSc Thesis Rakhmanberdieva, Nurzat.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Eingeschränkt (Max Planck Institute for Informatics, MSIN; )

MIME-Typ / Prüfsumme:
application/pdf

Technische Metadaten:

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Rakhmanberdieva, Nurzat¹, Autor
Klakow, Dietrich², Ratgeber
Berberich, Klaus³, Gutachter

Affiliations:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551
2External Organizations, ou_persistent22
3Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Inhalt

einblenden:

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Angenommen: 2017-11-30Erschienen: 2017

Publikationsstatus: Erschienen

Seiten: 83 p.

Ort, Verlag, Ausgabe: Saarbrücken : Universität des Saarlandes

Inhaltsverzeichnis: Machine Learning methods, especially Deep Learning, had an enormous breakthrough in Natural Language Processing and Computer Vision. They showed incredible performance in solving complex problems with minimum human interaction when large amount of labeled data is available. The hardest part is labeling large quantities of unlabeled data as it is time-consuming, expensive and requires expert knowledge. The Data Programming Paradigm which was introduced at NIPS 2016 proposes a method that uses labeling functions. They are a set of heuristic rules that produce large, but noisy training data which is later denoised by a generative model of these labeling functions.
In this thesis, we explored portability of Data Programming Paradigm to new domains. We applied it to sequence labeling also known as Slot-ﬁlling for Spoken Language Understanding and Named Entity Extraction. First, to allow these tasks to be included as part of the pipeline, we modiﬁed the initial data processing and candidate generation steps in the model. Second, we introduced a new type of labeling functions to test the hypothesis that "lightly" trained models can serve as a solid labeling function in combination with other functions. In this context, "lightly" trained models denote Deep Learning methods such as Convolutional and Recurrent Neural Networks that are trained with a small subset of data. Third, we described the strategies to implement and select optimal labeling functions. Finally, we showed that Data Programming Paradigm can be successfully extended to such tasks and outperforms its counterparts on noisy data. The experimental results for Slot-ﬁlling showed that the for the clean data, Data Programming Paradigm achieved 5.9 points better F1 score than the baseline. But on noisy data, it outperforms twice its counterparts such as Conditional Random Fields. We examined the model with benchmarks such as Air Travel Information System and SAP related datasets.

Art der Begutachtung: -

Identifikatoren: BibTex Citekey: RakhmanberdievaMaster2018

Art des Abschluß: Master

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle