ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with 
Paraphrase Clusters

Abujabal, Abdalghani; Saha Roy, Rishiraj; Yahya, Mohamed; Weikum, Gerhard

Local TagsRelease HistoryDetailsSummary

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

Abujabal, A., Saha Roy, R., Yahya, M., & Weikum, G. (2018). ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters. Retrieved from http://arxiv.org/abs/1809.09528.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-A0FE-B Version Permalink: https://hdl.handle.net/21.11116/0000-0003-7640-0

Genre: Paper

Latex : {ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

Files

show Files

hide Files

:

arXiv:1809.09528.pdf (Preprint), 598KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0002-A100-7

Name:
arXiv:1809.09528.pdf

Description:
File downloaded from arXiv at 2018-12-07 09:00

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://creativecommons.org/licenses/by/4.0/

Locators

show

Creators

show

hide

Creators:
Abujabal, Abdalghani¹, Author
Saha Roy, Rishiraj¹, Author
Yahya, Mohamed², Author
Weikum, Gerhard¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2External Organizations, ou_persistent22

Content

show

hide

Free keywords: Computer Science, Computation and Language, cs.CL

Abstract: To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what real users ask, we need large datasets of real
user questions that capture the various question phenomena users are interested
in, and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as temporal reasoning, compositionality, etc. ComQA
questions come from the WikiAnswers community QA platform. Through a large
crowdsourcing effort, we clean the question dataset, group questions into
paraphrase clusters, and annotate clusters with their answers. ComQA contains
11,214 questions grouped into 4,834 paraphrase clusters. We detail the process
of constructing ComQA, including the measures taken to ensure its high quality
while making effective use of crowdsourcing. We also present an extensive
analysis of the dataset and the results achieved by state-of-the-art systems on
ComQA, demonstrating that our dataset can be a driver of future research on QA.

Details

show

hide

Language(s): eng - English

Dates: Created: 2018-09-25Published Online: 2018

Publication Status: Published online

Pages: 11 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1809.09528
URI: http://arxiv.org/abs/1809.09528
BibTex Citekey: Abujabal_arXiv1809.09528

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show