Deutsch
 
Benutzerhandbuch Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

MPG-Autoren
/persons/resource/persons123292

Abujabal,  Abdalghani
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons185343

Roy,  Rishiraj Saha
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

arXiv:1809.09528.pdf
(Preprint), 598KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Abujabal, A., Roy, R. S., Yahya, M., & Weikum, G. (2018). ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters. Retrieved from http://arxiv.org/abs/1809.09528.


Zitierlink: http://hdl.handle.net/21.11116/0000-0002-A0FE-B
Zusammenfassung
To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.