hide
Free keywords:
-
Abstract:
This paper gives an overview on the YAGO-NAGA approach to information
extraction for
building a conveniently searchable, large-scale, highly accurate knowledge base
of common facts.
YAGO harvests infoboxes and category names of Wikipedia for facts about
individual
entities, and it reconciles these with the taxonomic backbone of WordNet in
order
to ensure that all entities have proper classes and the class system is
consistent.
Currently, the YAGO knowledge base contains about 19 million instances of binary
relations for about 1.95 million entities. Based on intensive sampling, its
accuracy is
estimated to be above 95 percent.
The paper presents the architecture of the YAGO extractor toolkit, its
distinctive
approach to consistency checking, its provisions for maintenance and further
growth,
and the query engine for YAGO, coined NAGA.
It also discusses ongoing work on extensions towards integrating fact candidates
extracted from natural-language text sources.