English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Discovering Reliable Approximate Functional Dependencies

MPS-Authors
/persons/resource/persons101868

Mandros,  Panagiotis
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons188983

Boley,  Mario
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons79525

Vreeken,  Jilles
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:1705.09391.pdf
(Preprint), 867KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Mandros, P., Boley, M., & Vreeken, J. (2017). Discovering Reliable Approximate Functional Dependencies. Retrieved from http://arxiv.org/abs/1705.09391.


Cite as: https://hdl.handle.net/11858/00-001M-0000-002D-90F8-D
Abstract
Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.