hide
Free keywords:
Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
Abstract:
Answering complex questions over knowledge bases (KB-QA) faces huge input
data with billions of facts, involving millions of entities and thousands of
predicates. For efficiency, QA systems first reduce the answer search space by
identifying a set of facts that is likely to contain all answers and relevant
cues. The most common technique or doing this is to apply named entity
disambiguation (NED) systems to the question, and retrieve KB facts for the
disambiguated entities. This work presents CLOCQ, an efficient method that
prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses
a top-k query processor over score-ordered lists of KB items that combine
signals about lexical matching, relevance to the question, coherence among
candidate items, and connectivity in the KB graph. Experiments with two recent
QA benchmarks for complex questions demonstrate the superiority of CLOCQ over
state-of-the-art baselines with respect to answer presence, size of the search
space, and runtimes.