hide
Free keywords:
Computer Science, Computation and Language, cs.CL
Abstract:
As news and social media exhibit an increasing amount of manipulative
polarized content, detecting such propaganda has received attention as a new
task for content analysis. Prior work has focused on supervised learning with
training data from the same domain. However, as propaganda can be subtle and
keeps evolving, manual identification and proper labeling are very demanding.
As a consequence, training data is a major bottleneck. In this paper, we tackle
this bottleneck and present an approach to leverage cross-domain learning,
based on labeled documents and sentences from news and tweets, as well as
political speeches with a clear difference in their degrees of being
propagandistic. We devise informative features and build various classifiers
for propaganda labeling, using cross-domain learning. Our experiments
demonstrate the usefulness of this approach, and identify difficulties and
limitations in various configurations of sources and targets for the transfer
step. We further analyze the influence of various features, and characterize
salient indicators of propaganda.