English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  A network approach to topic models

Gerlach, M., Peixoto, T. P., & Altmann, E. G. (2018). A network approach to topic models. Science Advances, 4(7): eaaq1360. doi:10.1126/sciadv.aaq1360.

Item is

Files

show Files
hide Files
:
1708.01677.pdf (Preprint), 4MB
Name:
1708.01677.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show
hide
Description:
-
OA-Status:

Creators

show
hide
 Creators:
Gerlach, Martin1, Author           
Peixoto, Tiago P.2, Author
Altmann, Eduardo G.1, Author           
Affiliations:
1Max Planck Institute for the Physics of Complex Systems, Max Planck Society, ou_2117288              
2external, ou_persistent22              

Content

show
hide
Free keywords: -
 MPIPKS: Structure formation and active systems
 Abstract: One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents. Despite their success particularly of the most widely used variant called latent Dirichlet allocation (LDA) and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, for example, a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. We obtain a fresh view of the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. We achieve this by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods (using a stochastic block model (SBM) with non parametric priors), we obtain a more versatile and principled framework for topic modeling (for example, it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. Our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.

Details

show
hide
Language(s): eng - English
 Dates: 2018-07-182018-07-04
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: ISI: 000443176100009
DOI: 10.1126/sciadv.aaq1360
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Science Advances
  Other : Sci. Adv.
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Washington : AAAS
Pages: - Volume / Issue: 4 (7) Sequence Number: eaaq1360 Start / End Page: - Identifier: ISSN: 2375-2548
CoNE: https://pure.mpg.de/cone/journals/resource/2375-2548