English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Predicting Document Coverage for Relation Extraction

MPS-Authors

Singhania,  Sneha
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons212613

Razniewski,  Simon
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2111.13611.pdf
(Preprint), 502KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Singhania, S., Razniewski, S., & Weikum, G. (2021). Predicting Document Coverage for Relation Extraction. Retrieved from https://arxiv.org/abs/2111.13611.


Cite as: https://hdl.handle.net/21.11116/0000-000A-237F-1
Abstract
This paper presents a new task of predicting the coverage of a text document
for relation extraction (RE): does the document contain many relational tuples
for a given entity? Coverage predictions are useful in selecting the best
documents for knowledge base construction with large input corpora. To study
this problem, we present a dataset of 31,366 diverse documents for 520
entities. We analyze the correlation of document coverage with features like
length, entity mention frequency, Alexa rank, language complexity and
information retrieval scores. Each of these features has only moderate
predictive power. We employ methods combining features with statistical models
like TF-IDF and language models like BERT. The model combining features and
BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of
coverage predictions on two use cases: KB construction and claim refutation.