hide
Free keywords:
-
Abstract:
Database systems often have to deal with incomplete information as the world
they model is not always complete. This is a frequent case in data integration
applications, scientific databases, or in scenarios where information is
manually entered and constains errors or ambiguity.
In the las couple of decades different formalisms have been proposed for
representing incomplete information. These include, among other, the so called
realtions with or-sets, tables with variables (v-tables) and conditional tables
(c-tables).
However, none of the current approaches for representing incomple information
has satisfied the requirements for a powerful and effcient data management
system, which is the reason why none has found application in practice. All
models generally suffer from at least one of two weaknesses. Either they are
not strong enough for representing results of simple queries, as is the case
for v-tables and realtions with or-sets, or the handling and processing of the
data, e.g. for query evaluation, is intractable (as is the case for c-tables).
In this thesis, e present a decomposition-based approach to addressing the
problem of incompletely specified databases. we introduce world-set
decompositions (WSDs), a space-efficient formalism for repreenting any finite
set of possible worlds over relational algebra queries on WSDs. For each
relational algebra operation we present an algorithm operting on WSDs.
We also address the problem of data cleaning in the context of world set
decompositions. We present a modified version of the existing Chase algorithm,
which we use to remove inconsistent worlds in an incompletely specified
database.
We evaluate our techniques in a large census data scenario with data
originating from the 1990 USA census and we show that data processing on WSDs
is both scalable and efficient.