ausblenden:
Schlagwörter:
Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT
Zusammenfassung:
Given a database and a target attribute of interest, how can we tell whether
there exists a functional, or approximately functional dependence of the target
on any set of other attributes in the data? How can we reliably, without bias
to sample size or dimensionality, measure the strength of such a dependence?
And, how can we efficiently discover the optimal or $\alpha$-approximate
top-$k$ dependencies? These are exactly the questions we answer in this paper.
As we want to be agnostic on the form of the dependence, we adopt an
information-theoretic approach, and construct a reliable, bias correcting score
that can be efficiently computed. Moreover, we give an effective optimistic
estimator of this score, by which for the first time we can mine the
approximate functional dependencies from data with guarantees of optimality.
Empirical evaluation shows that the derived score achieves a good bias for
variance trade-off, can be used within an efficient discovery algorithm, and
indeed discovers meaningful dependencies. Most important, it remains reliable
in the face of data sparsity.