Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Data Science Methods for the Analysis of Controversial Social Media Discussions


Guimarães,  Anna
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Guimarães, A. (2022). Data Science Methods for the Analysis of Controversial Social Media Discussions. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-36502.

Cite as: https://hdl.handle.net/21.11116/0000-000A-CDF7-9
Social media communities like Reddit and Twitter allow users to express their views on
topics of their interest, and to engage with other users who may share or oppose these views.
This can lead to productive discussions towards a consensus, or to contended debates, where
disagreements frequently arise.
Prior work on such settings has primarily focused on identifying notable instances of antisocial
behavior such as hate-speech and “trolling”, which represent possible threats to the health of
a community. These, however, are exceptionally severe phenomena, and do not encompass
controversies stemming from user debates, differences of opinions, and off-topic content, all
of which can naturally come up in a discussion without going so far as to compromise its
This dissertation proposes a framework for the systematic analysis of social media discussions
that take place in the presence of controversial themes, disagreements, and mixed opinions from
participating users. For this, we develop a feature-based model to describe key elements of a
discussion, such as its salient topics, the level of activity from users, the sentiments it expresses,
and the user feedback it receives.
Initially, we build our feature model to characterize adversarial discussions surrounding
political campaigns on Twitter, with a focus on the factual and sentimental nature of their
topics and the role played by different users involved. We then extend our approach to Reddit
discussions, leveraging community feedback signals to define a new notion of controversy
and to highlight conversational archetypes that arise from frequent and interesting interaction
patterns. We use our feature model to build logistic regression classifiers that can predict future
instances of controversy in Reddit communities centered on politics, world news, sports, and
personal relationships. Finally, our model also provides the basis for a comparison of different
communities in the health domain, where topics and activity vary considerably despite their
shared overall focus. In each of these cases, our framework provides insight into how user
behavior can shape a community’s individual definition of controversy and its overall identity.