YAWN: A Semantically Annotated Wikipedia XML Corpus

Schenkel, Ralf; Suchanek, Fabian M.; Kasneci, Gjergji

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

YAWN: A Semantically Annotated Wikipedia XML Corpus

MPG-Autoren

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45572

Suchanek, Fabian M.
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44738

Kasneci, Gjergji
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Schenkel, R., Suchanek, F. M., & Kasneci, G. (2007). YAWN: A Semantically Annotated Wikipedia XML Corpus. In A. Kemper, H. Schöning, T. Rose, M. Jarke, T. Seidl, C. Quix, et al. (Eds.), Datenbanksysteme in Business, Technologie und Web (BTW): 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (pp. 277-291). Bonn, Germany: Gesellschaft für Informatik.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-2140-5

Zusammenfassung

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.