English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation

MPS-Authors
/persons/resource/persons244401

Kaiser,  Magdalena
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons185343

Saha Roy,  Rishiraj
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2310.13505.pdf
(Preprint), 3MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Kaiser, M., Saha Roy, R., & Weikum, G. (2023). Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation. Retrieved from https://arxiv.org/abs/2310.13505.


Cite as: https://hdl.handle.net/21.11116/0000-000D-E9D1-0
Abstract
Models for conversational question answering (ConvQA) over knowledge graphs
(KGs) are usually trained and tested on benchmarks of gold QA pairs. This
implies that training is limited to surface forms seen in the respective
datasets, and evaluation is on a small set of held-out questions. Through our
proposed framework REIGN, we take several steps to remedy this restricted
learning setup. First, we systematically generate reformulations of training
questions to increase robustness of models to surface form variations. This is
a particularly challenging problem, given the incomplete nature of such
questions. Second, we guide ConvQA models towards higher performance by feeding
it only those reformulations that help improve their answering quality, using
deep reinforcement learning. Third, we demonstrate the viability of training
major model components on one benchmark and applying them zero-shot to another.
Finally, for a rigorous evaluation of robustness for trained models, we use and
release large numbers of diverse reformulations generated by prompting GPT for
benchmark test sets (resulting in 20x increase in sizes). Our findings show
that ConvQA models with robust training via reformulations, significantly
outperform those with standard training from gold QA pairs only.