English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Generalized entropies and the similarity of texts

Altmann, E. G., Dias, L., & Gerlach, M. (2017). Generalized entropies and the similarity of texts. Journal of Statistical Mechanics: Theory and Experiment, 2017: 014002. doi:10.1088/1742-5468/aa53f5.

Item is

Files

show Files

Locators

show
hide
Description:
-
OA-Status:

Creators

show
hide
 Creators:
Altmann, Eduardo G.1, Author           
Dias, Laércio1, Author           
Gerlach, Martin1, Author           
Affiliations:
1Max Planck Institute for the Physics of Complex Systems, Max Planck Society, ou_2117288              

Content

show
hide
Free keywords: -
 MPIPKS: Stochastic processes
 Abstract: We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen-Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (from the google n-gram database) and scientific papers (indexed by Web of Science).

Details

show
hide
Language(s):
 Dates: 2017-01-272017-01-27
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: ISI: 000395125400001
DOI: 10.1088/1742-5468/aa53f5
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Journal of Statistical Mechanics: Theory and Experiment
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Bristol, England : Institute of Physics Publishing
Pages: - Volume / Issue: 2017 Sequence Number: 014002 Start / End Page: - Identifier: ISSN: 1742-5468
CoNE: https://pure.mpg.de/cone/journals/resource/111076098244006