ausblenden:
Sprache(n):
eng - English
Datum:
2017-11-302017
Publikationsstatus:
Erschienen
Seiten:
83 p.
Ort, Verlag, Ausgabe:
Saarbrücken : Universität des Saarlandes
Inhaltsverzeichnis:
Machine Learning methods, especially Deep Learning, had an enormous breakthrough in Natural Language Processing and Computer Vision. They showed incredible performance in solving complex problems with minimum human interaction when large amount of labeled data is available. The hardest part is labeling large quantities of unlabeled data as it is time-consuming, expensive and requires expert knowledge. The Data Programming Paradigm which was introduced at NIPS 2016 proposes a method that uses labeling functions. They are a set of heuristic rules that produce large, but noisy training data which is later denoised by a generative model of these labeling functions.
In this thesis, we explored portability of Data Programming Paradigm to new domains. We applied it to sequence labeling also known as Slot-filling for Spoken Language Understanding and Named Entity Extraction. First, to allow these tasks to be included as part of the pipeline, we modified the initial data processing and candidate generation steps in the model. Second, we introduced a new type of labeling functions to test the hypothesis that "lightly" trained models can serve as a solid labeling function in combination with other functions. In this context, "lightly" trained models denote Deep Learning methods such as Convolutional and Recurrent Neural Networks that are trained with a small subset of data. Third, we described the strategies to implement and select optimal labeling functions. Finally, we showed that Data Programming Paradigm can be successfully extended to such tasks and outperforms its counterparts on noisy data. The experimental results for Slot-filling showed that the for the clean data, Data Programming Paradigm achieved 5.9 points better F1 score than the baseline. But on noisy data, it outperforms twice its counterparts such as Conditional Random Fields. We examined the model with benchmarks such as Air Travel Information System and SAP related datasets.
Art der Begutachtung:
-
Identifikatoren:
BibTex Citekey: RakhmanberdievaMaster2018
Art des Abschluß:
Master