English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

ACT now: Aggregate Comparison of Traces for Incident Localization

MPS-Authors
/persons/resource/persons231493

Mace,  Jonathan
Group J. Mace, Max Planck Institute for Software Systems, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2205.06933.pdf
(Preprint), 835KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Ramasubramanian, K., Raina, A., Mace, J., & Alvaro, P. (2022). ACT now: Aggregate Comparison of Traces for Incident Localization. Retrieved from https://arxiv.org/abs/2205.06933.


Cite as: https://hdl.handle.net/21.11116/0000-000B-B54A-6
Abstract
Incidents in production systems are common and downtime is expensive.
Applying an appropriate mitigating action quickly, such as changing a specific
firewall rule, reverting a change, or diverting traffic to a different
availability zone, saves money. Incident localization is time-consuming since a
single failure can have many effects, extending far from the site of failure.
Knowing how different system events relate to each other is necessary to
quickly identify \emph{where} to mitigate. Our approach, Aggregate Comparison
of Traces (ACT), localizes incidents by comparing sets of traces (which capture
events and their relationships for individual requests) sampled from the most
recent steady-state operation and during an incident. In our quantitative
experiments, we show that ACT is able to effectively localize more than 99% of
incidents.