Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Empirical Evaluation of Common Assumptions in Building Political Bias Datasets


Ganguly,  Soumen
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Ganguly, S. (2019). Empirical Evaluation of Common Assumptions in Building Political Bias Datasets. Master Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: http://hdl.handle.net/21.11116/0000-0002-B37F-6
In today’s world, bias and polarization are some of the biggest problems plaguing our society. In such volatile environments, news media play a crucial role as the gatekeepers of the information. Given the huge impact they can have on societal evolution, they have long been studied by researchers. Researchers and practitioners often build political bias datasets for a variety of tasks ranging from examining bias of news outlets and articles to studying and designing algorithmic news retrieval systems for online platforms. Often, researchers make certain simplifying assumptions in building such datasets. In this thesis, we empirically validate three such common assumptions given the im- portance of such datasets. The three assumptions are, (i) raters’ political leaning does not affect their ratings of political articles, (ii) news articles follow the leaning of their source outlet, and (iii) political leaning of a news outlet is stable across reporting on different topics. We constructed a manually annotated ground-truth dataset of news articles, published by several popular news media outlets in the U.S., on “Gun policy” and “Immigration” along with their political bias leanings using Amazon Mechanical Turk and used it to validate these assumptions. Our findings suggest that, (i) in certain cases, liberal and conservative raters’ label leanings of news articles differently, (ii) in many cases, the news articles do not follow the political leaning of their source outlet, and (iii) for certain outlets, the political leaning of the outlet does not remain unchanged while reporting on different topics/issues. We believe, our work offers important guidelines for future attempts at building political bias datasets which in turn will help them in building better algorithmic news retrieval systems for online platforms.