Classification and Intelligent Search on Information in XML

Fuhr, Norbert; Weikum, Gerhard

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Classification and Intelligent Search on Information in XML

MPS-Authors

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Fuhr, N., & Weikum, G. (2002). Classification and Intelligent Search on Information in XML. IEEE Data Engineering Bulletin, 25(1), 51-58.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-30C7-5

Abstract

{XML} will be the method of choice for representing all kinds of documents in product catalogs, digital libraries, scientific data repositories, and across the Web. This observation creates high expectations that {XML} will be a major catalyst in constructing the “Semantic Web”. However, merely casting all documents into {XML} format does not necessarily make a document’s semantics explicit and more amenable for effective information searching. Rather, to fully leverage {XML} on a global scale, significant progress is needed on the following issues: 1. providing an easy-to-use yet powerful and efficient search language that combines concepts from current {XML} pattern-matching languages (e.g., {XP}ath, {XQ}uery, etc.) with ontology-backed information-retrievalstyle search result ranking, 2. extracting more semantics from existing document collections by constructing structural and ontological skeletons (e.g., in the form of {DTD}s or {XML} schemas) that describe the data at a higher semantic level and can also facilitate new forms of indexing for efficiency, and 3. classifying existing documents according to a given thematic or personalized, hierarchical ontology to make searching more effective (e.g., exploit relevance feedback) and efficient (e.g., limit the search focus). {CLASSIX}, a joint project of the Universities of Dortmund and the Saarland in Germany, addresses these three issues. We describe our approaches for each of these topics in the remainder of this paper.