hide
Free keywords:
-
Abstract:
{XML} will be the method of choice for representing all kinds of documents in
product catalogs, digital libraries,
scientific data repositories, and across the Web. This observation creates high
expectations that {XML} will
be a major catalyst in constructing the “Semantic Web”. However, merely casting
all documents into {XML}
format does not necessarily make a document’s semantics explicit and more
amenable for effective information
searching. Rather, to fully leverage {XML} on a global scale, significant
progress is needed on the following
issues:
1. providing an easy-to-use yet powerful and efficient search language that
combines concepts from current
{XML} pattern-matching languages (e.g., {XP}ath, {XQ}uery, etc.) with
ontology-backed information-retrievalstyle
search result ranking,
2. extracting more semantics from existing document collections by constructing
structural and ontological
skeletons (e.g., in the form of {DTD}s or {XML} schemas) that describe the data
at a higher semantic level
and can also facilitate new forms of indexing for efficiency, and
3. classifying existing documents according to a given thematic or
personalized, hierarchical ontology to
make searching more effective (e.g., exploit relevance feedback) and efficient
(e.g., limit the search focus).
{CLASSIX}, a joint project of the Universities of Dortmund and the Saarland in
Germany, addresses these three
issues. We describe our approaches for each of these topics in the remainder of
this paper.