Proactive Learning Algorithms: A Survey of the State of the Art and 
Implementation of Novel and Concrete Algorithm for (Unstructured) Data 
Classiﬁcation

Anis, Myriam

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Hochschulschrift

Proactive Learning Algorithms: A Survey of the State of the Art and Implementation of Novel and Concrete Algorithm for (Unstructured) Data Classiﬁcation

MPG-Autoren

Anis, Myriam
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Anis, M. (2019). Proactive Learning Algorithms: A Survey of the State of the Art and Implementation of Novel and Concrete Algorithm for (Unstructured) Data Classiﬁcation. Master Thesis, Universität des Saarlandes, Saarbrücken.

Zitierlink: https://hdl.handle.net/21.11116/0000-0005-9C5B-6

Zusammenfassung

Artiﬁcial Intelligence (AI) has become one of the most researched ﬁelds nowadays. Ma-
chine Learning (ML) is one of the most popular AI domains, where systems are created
with the capability of automatic learning and improving from the learning experience.
The current revolution in the size and cost of electronic storage allows for the existence
of enormous amount of data that can be used for ML training. Unfortunately, not all
of this data is labelled. The process of manually labelling documents can be expen-
sive, time consuming and subject to human errors. Active Learning (AL) addresses
this challenge by ﬁnding a sample of the enormous data corpus that, if labelled, can
substitute the use of the whole dataset. AL routes this sample to a human labeller to
formulate the training dataset needed for the ML model. AL assumes that there exists a
single, infallible and indefatigable labeller. These assumptions cannot cope to real world
problems. The main focus of this work is to introduce Proactive Learning (PL) to an
existing AL system. PL aims at generalizing the problem, solved by AL, by relaxing
all of its assumptions about the user. The main addition of this project is enhancing
automatic text classiﬁcation by combining knowledge from the domain of PL and from
Instance Relabelling paradigms to update the currently implemented AL system. The
implemented PL system is tested on the 20 Newsgroups, Reuters and AG News datasets.
The system is capable of reaching impressive results in detecting and predicting users
actions, which allows the system to eﬃciently route labelling tasks to the best users,
leading to minimize the risk of receiving wrong labels.