Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning

MPG-Autoren
/persons/resource/persons225814

Lahoti,  Preethi
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)

arXiv:2109.04432.pdf
(Preprint), 999KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Lahoti, P., Gummadi, K., & Weikum, G. (2021). Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning. Retrieved from https://arxiv.org/abs/2109.04432.


Zitierlink: https://hdl.handle.net/21.11116/0000-0009-6491-2
Zusammenfassung
Reliably predicting potential failure risks of machine learning (ML) systems
when deployed with production data is a crucial aspect of trustworthy AI. This
paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating
failure risks and predictive uncertainties of any already-trained black-box
classification model. In addition to providing a risk score, the Risk Advisor
decomposes the uncertainty estimates into aleatoric and epistemic uncertainty
components, thus giving informative insights into the sources of uncertainty
inducing the failures. Consequently, Risk Advisor can distinguish between
failures caused by data variability, data shifts and model limitations and
advise on mitigation actions (e.g., collecting more data to counter data
shift). Extensive experiments on various families of black-box classification
models and on real-world and synthetic datasets covering common ML failure
scenarios show that the Risk Advisor reliably predicts deployment-time failure
risks in all the scenarios, and outperforms strong baselines.