Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Do Children Texts Hold The Key To Commonsense Knowledge?


Razniewski,  Simon
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

(Preprint), 249KB

Supplementary Material (public)
There is no public supplementary material available

Romero, J., & Razniewski, S. (2022). Do Children Texts Hold The Key To Commonsense Knowledge? doi:10.48550/arXiv.2210.04530.

Cite as: https://hdl.handle.net/21.11116/0000-000B-58AA-3
Compiling comprehensive repositories of commonsense knowledge is a
long-standing problem in AI. Many concerns revolve around the issue of
reporting bias, i.e., that frequency in text sources is not a good proxy for
relevance or truth. This paper explores whether children's texts hold the key
to commonsense knowledge compilation, based on the hypothesis that such content
makes fewer assumptions on the reader's knowledge, and therefore spells out
commonsense more explicitly. An analysis with several corpora shows that
children's texts indeed contain much more, and more typical commonsense
assertions. Moreover, experiments show that this advantage can be leveraged in
popular language-model-based commonsense knowledge extraction settings, where
task-unspecific fine-tuning on small amounts of children texts (childBERT)
already yields significant improvements. This provides a refreshing perspective
different from the common trend of deriving progress from ever larger models
and corpora.