Deriving a Web-scale Common Sense Fact Knowledge Base

Tandon, Niket

Deriving a Web-scale Common Sense Fact Knowledge Base

Tandon, N. (2011). Deriving a Web-scale Common Sense Fact Knowledge Base. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0027-ABF9-8 版のパーマリンク: https://hdl.handle.net/21.11116/0000-000E-314F-3

資料種別: 学位論文

ファイル

表示: ファイル

非表示: ファイル

:

2011_Niket_Tandon_Thesis.pdf (全文テキスト（全般）), 809KB

ファイルのパーマリンク:
-

ファイル名:
2011_Niket_Tandon_Thesis.pdf

説明:
-

OA-Status:

閲覧制限:
制限付き (Max Planck Institute for Informatics, MSIN; )

MIMEタイプ / チェックサム:
application/pdf

技術的なメタデータ:

著作権日付:
-

著作権情報:
-

CCライセンス:
-

作成者

表示:

非表示:

作成者:
Tandon, Niket^{1, 2}, 著者
Weikum, Gerhard¹, 学位論文主査
Theobalt, Christian³, 監修者

所属:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551
3Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047

内容説明

表示:

非表示:

キーワード: -

要旨: The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. Extracting such structured data from text is challenging, especially due to the scarcity of explicitly expressed knowledge. Even when relying on large document collections, patternbased information extraction approaches typically discover insufficient amounts of information. This thesis investigates harvesting massive amounts of common sense knowledge using the textual knowledge of the entire Web, yet staying away from the massive engineering efforts in procuring such a large corpus as a Web. Despite the advancements in knowledge harvesting, we observed that the state of the art methods were limited in terms of accuracy and discovered insufficient amounts of information under our desired setting. This thesis shows how to gather large amounts of common sense facts from Web N-gram data, using seeds from the existing knowledge bases like ConceptNet. Our novel contributions include scalable methods for tapping onto Web-scale data and a new scoring model to determine which patterns and facts are most reliable. An extensive experimental evaluation is provided for three different binary relations, comparing different sources of n-gram data as well as different algorithms. The experimental results show that this approach extends ConceptNet by many orders of magnitude (more than 200-fold) at comparable levels of precision.

資料詳細

表示:

非表示:

言語: eng - English

日付: 受理: 2011-08出版: 2011

出版の状態: 出版

ページ: X, 81 p.

出版情報: Saarbrücken : Universität des Saarlandes

目次: -

査読: -

識別子（DOI, ISBNなど）: BibTex参照ID: MasterTandon2011

学位: 修士号 (Master)

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物