hide
Free keywords:
-
Abstract:
abstract 1:
The World Wide Web provides a nearly endless source of knowledge, which is
mostly given in natural language. A first step towards exploiting this data
automatically could be to extract pairs of a given semantic relation from text
documents - for example all pairs of a person and her birthdate. One strategy
for this task is to find text patterns that express the semantic relation, to
generalize these patterns, and to apply them to a corpus to find new pairs. In
this paper, we show that this approach profits significantly when deep
linguistic structures are used instead of surface text patterns. We demonstrate
how linguistic structures can be represented for machine learning, and we
provide a theoretical analysis of the pattern matching approach. We show the
benefits of our approach by extensive experiments with our prototype system
LEILA.
abstract 2:
Search engines, question answering systems and classification systems
alike can greatly profit from formalized world knowledge.
Unfortunately, manually compiled collections of world knowledge (such
as WordNet or the Suggested Upper Merged Ontology SUMO) often suffer
from low coverage, high assembling costs and fast aging. In contrast,
the World Wide Web provides an endless source of knowledge, assembled
by millions of people, updated constantly and available for free. In
this paper, we propose a novel method for learning arbitrary binary
relations from natural language Web documents, without human
interaction. Our system, LEILA, combines linguistic analysis and
machine learning techniques to find robust patterns in the text and to
generalize them. For initialization, we only require a set of examples
of the target relation and a set of counterexamples (e.g. from
WordNet). The architecture consists of 3 stages: Finding patterns in
the corpus based on the given examples, assessing the patterns based on
probabilistic confidence, and applying the generalized patterns to
propose pairs for the target relation. We prove the benefits and
practical viability of our approach by extensive experiments, showing
that LEILA achieves consistent improvements over existing comparable
techniques (e.g. Snowball, TextToOnto).