Data Science Methods for the Analysis of Controversial Social Media Discussions

Guimarães, Anna

doi:10.22028/D291-36502

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

Data Science Methods for the Analysis of Controversial Social Media Discussions

MPS-Authors

/persons/resource/persons214551

Guimarães, Anna
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33161
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Guimarães, A. (2022). Data Science Methods for the Analysis of Controversial Social Media Discussions. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-36502.

Cite as: https://hdl.handle.net/21.11116/0000-000A-CDF7-9

Abstract

Social media communities like Reddit and Twitter allow users to express their views on
topics of their interest, and to engage with other users who may share or oppose these views.
This can lead to productive discussions towards a consensus, or to contended debates, where
disagreements frequently arise.
Prior work on such settings has primarily focused on identifying notable instances of antisocial
behavior such as hate-speech and “trolling”, which represent possible threats to the health of
a community. These, however, are exceptionally severe phenomena, and do not encompass
controversies stemming from user debates, differences of opinions, and off-topic content, all
of which can naturally come up in a discussion without going so far as to compromise its
development.
This dissertation proposes a framework for the systematic analysis of social media discussions
that take place in the presence of controversial themes, disagreements, and mixed opinions from
participating users. For this, we develop a feature-based model to describe key elements of a
discussion, such as its salient topics, the level of activity from users, the sentiments it expresses,
and the user feedback it receives.
Initially, we build our feature model to characterize adversarial discussions surrounding
political campaigns on Twitter, with a focus on the factual and sentimental nature of their
topics and the role played by different users involved. We then extend our approach to Reddit
discussions, leveraging community feedback signals to define a new notion of controversy
and to highlight conversational archetypes that arise from frequent and interesting interaction
patterns. We use our feature model to build logistic regression classifiers that can predict future
instances of controversy in Reddit communities centered on politics, world news, sports, and
personal relationships. Finally, our model also provides the basis for a comparison of different
communities in the health domain, where topics and activity vary considerably despite their
shared overall focus. In each of these cases, our framework provides insight into how user
behavior can shape a community’s individual definition of controversy and its overall identity.