Understanding Quantities in Web Tables and Text

Ibrahim, Yusra

doi:10.22028/D291-29657

DetailsSummary

Understanding Quantities in Web Tables and Text

Ibrahim, Y. (2019). Understanding Quantities in Web Tables and Text. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-29657.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0005-4384-A Version Permalink: https://hdl.handle.net/21.11116/0000-0005-4385-9

Genre: Thesis

Files

show Files

Locators

show

hide

Locator:
https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28300 (Any fulltext) Open Access Green

Description:
-

OA-Status:
Green

Creators

show

hide

Creators:
Ibrahim, Yusra^{1, 2}, Author
Weikum, Gerhard¹, Advisor
Riedewald, Mirek³, Referee
Berberich, Klaus¹, Referee

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551
3Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019

Content

show

hide

Free keywords: -

Abstract: There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2019-10-08Published Online: 2019Date issued: 2019

Publication Status: Issued

Pages: 116 p.

Publishing info: Saarbrücken : Universität des Saarlandes

Table of Contents: -

Rev. Type: -

Identifiers: BibTex Citekey: yusraphd2019
DOI: 10.22028/D291-29657

Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show