hide
Free keywords:
cs.SI,Computer Science, Data Structures and Algorithms, cs.DS
Abstract:
Social media users and microbloggers post about a wide variety of (off-line)
collective social activities as they participate in them, ranging from concerts
and sporting events to political rallies and civil protests. In this context,
people who take part in the same collective social activity often post closely
related content from nearby locations at similar times, resulting in
distinctive spatiotemporal patterns. Can we automatically detect these patterns
and thus provide insights into the associated activities? In this paper, we
propose a modeling framework for clustering streaming spatiotemporal data, the
Spatial Dirichlet Hawkes Process (SDHP), which allows us to automatically
uncover a wide variety of spatiotemporal patterns of collective social activity
from geolocated online traces. Moreover, we develop an efficient, online
inference algorithm based on Sequential Monte Carlo that scales to millions of
geolocated posts. Experiments on synthetic data and real data gathered from
Twitter show that our framework can recover a wide variety of meaningful social
activity patterns in terms of both content and spatiotemporal dynamics, that it
yields interesting insights about these patterns, and that it can be used to
estimate the location from where a tweet was posted.