hide
Free keywords:
extreme value; fat tail; Pareto distribution; scale-free; self-similarity; tail index
Abstract:
Despite widespread claims of power laws across the natural and social sciences, evidence in data is often equivocal. Modern data and statistical methods reject even classic power laws such as Pareto's law of wealth and the Gutenberg-Richter law for earthquake magnitudes. We show that the maximum-likelihood estimators and Kolmogorov-Smirnov (K-S) statistics in widespread use are unexpectedly sensitive to ubiquitous errors in data such as measurement noise, quantization noise, heaping and censorship of small values. This sensitivity causes spurious rejection of power laws and biases parameter estimates even in arbitrarily large samples, which explains inconsistencies between theory and data. We show that logarithmic binning by powers of λ > 1 attenuates these errors in a manner analogous to noise averaging in normal statistics and that λ thereby tunes a trade-off between accuracy and precision in estimation. Binning also removes potentially misleading within-scale information while preserving information about the shape of a distribution over powers of λ, and we show that some amount of binning can improve sensitivity and specificity of K-S tests without any cost, while more extreme binning tunes a trade-off between sensitivity and specificity. We therefore advocate logarithmic binning as a simple essential step in power-law inference.