English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Bayesian Entropy Estimation for Countable Discrete Distributions

MPS-Authors
/persons/resource/persons192917

Archer,  Evan W
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Former Research Group Neural Computation and Behaviour, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Archer, E. W., Park, I., & Pillow, J. (2014). Bayesian Entropy Estimation for Countable Discrete Distributions. Journal of Machine Learning Research, 15, 2833-2868.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0027-7FAB-6
Abstract
We consider the problem of estimating Shannon's entropy H from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non- parametric statistics and machine learning. Here we show that it provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over H can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over H, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous measures for mixing Pitman-Yor processes to produce an approximately flat prior over H. We show that the resulting ''Pitman-Yor Mixture'' (PYM) entropy estimator is consistent for a large class of distributions. Finally, we explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.