Efficient Representation and Processing of Incomplete Information

Antova, Lyublena

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Hochschulschrift

Efficient Representation and Processing of Incomplete Information

MPG-Autoren

/persons/resource/persons44024

Antova, Lyublena
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Antova, L. (2006). Efficient Representation and Processing of Incomplete Information. Master Thesis, Universität des Saarlandes, Saarbrücken.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0027-D311-7

Zusammenfassung

Database systems often have to deal with incomplete information as the world they model is not always complete. This is a frequent case in data integration applications, scientific databases, or in scenarios where information is manually entered and constains errors or ambiguity. In the las couple of decades different formalisms have been proposed for representing incomplete information. These include, among other, the so called realtions with or-sets, tables with variables (v-tables) and conditional tables (c-tables). However, none of the current approaches for representing incomple information has satisfied the requirements for a powerful and effcient data management system, which is the reason why none has found application in practice. All models generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, as is the case for v-tables and realtions with or-sets, or the handling and processing of the data, e.g. for query evaluation, is intractable (as is the case for c-tables). In this thesis, e present a decomposition-based approach to addressing the problem of incompletely specified databases. we introduce world-set decompositions (WSDs), a space-efficient formalism for repreenting any finite set of possible worlds over relational algebra queries on WSDs. For each relational algebra operation we present an algorithm operting on WSDs. We also address the problem of data cleaning in the context of world set decompositions. We present a modified version of the existing Chase algorithm, which we use to remove inconsistent worlds in an incompletely specified database. We evaluate our techniques in a large census data scenario with data originating from the 1990 USA census and we show that data processing on WSDs is both scalable and efficient.