非表示:
キーワード:
-
要旨:
Learning mappings between arbitrary structured input and output variables
is a fundamental problem in machine learning. It covers many natural learning
tasks and challenges the standard model of learning a mapping from
independently drawn instances to a small set of labels. Potential applications
include classification with a class taxonomy, named entity recognition,
and natural language parsing. In these structured domains, labeled training
instances are generally expensive to obtain while unlabeled inputs are readily
available and inexpensive.
This thesis deals with semi-supervised learning of discriminative models
for structured output variables. The analytical techniques and algorithms of
classical semi-supervised learning are lifted to the structured setting. Several
approaches based on different assumptions of the data are presented. Colearning,
for instance, maximizes the agreement among multiple hypotheses
while transductive approaches rely on an implicit cluster assumption.
Furthermore,
in the framework of this dissertation, a case study on email batch
detection in message streams is presented. The involved tasks exhibit an
inherent cluster structure and the presented solution exploits the streaming
nature of the data.
The different approaches are developed into semi-supervised structured
prediction models and efficient optimization strategies thereof are presented.
The novel algorithms generalize state-of-the-art approaches in structural
learning
such as structural support vector machines. Empirical results show
that the semi-supervised algorithms lead to significantly lower error rates
than their fully supervised counterparts in many application areas, including
multi-class classification, named entity recognition, and natural language
parsing.