Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Evaluating Language Models for Knowledge Base Completion

MPG-Autoren
/persons/resource/persons288004

Veseli,  Blerta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons270904

Singhania,  Sneha
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

https://github.com/bveseli/LMsForKBC
(Ergänzendes Material)

Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)

arXiv:2303.11082.pdf
(Preprint), 450KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Veseli, B., Singhania, S., Razniewski, S., & Weikum, G. (2023). Evaluating Language Models for Knowledge Base Completion. Retrieved from https://arxiv.org/abs/2303.11082.


Zitierlink: https://hdl.handle.net/21.11116/0000-000C-D3CD-F
Zusammenfassung
Structured knowledge bases (KBs) are a foundation of many intelligent
applications, yet are notoriously incomplete. Language models (LMs) have
recently been proposed for unsupervised knowledge base completion (KBC), yet,
despite encouraging initial results, questions regarding their suitability
remain open. Existing evaluations often fall short because they only evaluate
on popular subjects, or sample already existing facts from KBs. In this work,
we introduce a novel, more challenging benchmark dataset, and a methodology
tailored for a realistic assessment of the KBC potential of LMs. For automated
assessment, we curate a dataset called WD-KNOWN, which provides an unbiased
random sample of Wikidata, containing over 3.9 million facts. In a second step,
we perform a human evaluation on predictions that are not yet in the KB, as
only this provides real insights into the added value over existing KBs. Our
key finding is that biases in dataset conception of previous benchmarks lead to
a systematic overestimate of LM performance for KBC. However, our results also
reveal strong areas of LMs. We could, for example, perform a significant
completion of Wikidata on the relations nativeLanguage, by a factor of ~21
(from 260k to 5.8M) at 82% precision, usedLanguage, by a factor of ~2.1 (from
2.1M to 6.6M) at 82% precision, and citizenOf by a factor of ~0.3 (from 4.2M to
5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong
generalization capabilities: even on relations where most facts were not
directly observed in LM training, prediction quality can be high.