hide
Free keywords:
-
Abstract:
Nowadays the dialogue act classification is one of the hot topics in computational
linguistics. Different machine learning algorithms were used for dialogue act classi-
fication. In this thesis, we investigate the cross domain dialogue act classification
using Support Vector Machines. The goal of the research reported in this work is
to explore features for effective cross domain classification. The work includes two
phases of data-driven investigation. The first phase involves collecting, and analyzing
corpora, while the second phase involves domain independent feature selection and
extraction.
Dialogue act annotation were collected from three different corpora: AMI 1,
HCRC MapTask 2 and SWBD DAMSL [1]. Based on ISO standards, these dialogue
acts were mapped to corresponding groups. Number of various experiments were
carried out to find features with the best predictive power. The results show that
the combination of multiple features: bigrams of Part-Of-Speech, Chunks and words,
lead to consistent improvement of the classifier's performance than features in isolation.
Finally, we investigate the portability and generalibility of proposed approach
on extracted features when using set of features that showed the best predictive results on unseen Metalogue corpus 3. The findings indicate that good classification
accuracy can be achieved using our approach, and that there is a set of automatically extracted feature are shared between large corpora, that prove to be extremely reliable when used directly to classify Dialogue Acts.