Predicting Document Coverage for Relation Extraction

Singhania, Sneha; Razniewski, Simon; Weikum, Gerhard

Local TagsRelease HistoryDetailsSummary

Predicting Document Coverage for Relation Extraction

Singhania, S., Razniewski, S., & Weikum, G. (2021). Predicting Document Coverage for Relation Extraction. Retrieved from https://arxiv.org/abs/2111.13611.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000A-237F-1 Version Permalink: https://hdl.handle.net/21.11116/0000-000A-2380-D

Genre: Paper

Files

show Files

hide Files

:

arXiv:2111.13611.pdf (Preprint), 502KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000A-2381-C

Name:
arXiv:2111.13611.pdf

Description:
File downloaded from arXiv at 2022-03-24 07:39 To appear in TACL. The arXiv version is a pre-MIT Press publication version

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://creativecommons.org/licenses/by/4.0/

Locators

show

Creators

show

hide

Creators:
Singhania, Sneha¹, Author
Razniewski, Simon¹, Author
Weikum, Gerhard¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

show

hide

Free keywords: Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI

Abstract: This paper presents a new task of predicting the coverage of a text document
for relation extraction (RE): does the document contain many relational tuples
for a given entity? Coverage predictions are useful in selecting the best
documents for knowledge base construction with large input corpora. To study
this problem, we present a dataset of 31,366 diverse documents for 520
entities. We analyze the correlation of document coverage with features like
length, entity mention frequency, Alexa rank, language complexity and
information retrieval scores. Each of these features has only moderate
predictive power. We employ methods combining features with statistical models
like TF-IDF and language models like BERT. The model combining features and
BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of
coverage predictions on two use cases: KB construction and claim refutation.

Details

show

hide

Language(s): eng - English

Dates: Created: 2021-11-26Published Online: 2021

Publication Status: Published online

Pages: 16 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 2111.13611
URI: https://arxiv.org/abs/2111.13611
BibTex Citekey: Singhania2021

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show