ACT now: Aggregate Comparison of Traces for Incident Localization

Ramasubramanian, Kamala; Raina, Ashutosh; Mace, Jonathan; Alvaro, Peter

Local TagsRelease HistoryDetailsSummary

ACT now: Aggregate Comparison of Traces for Incident Localization

Ramasubramanian, K., Raina, A., Mace, J., & Alvaro, P. (2022). ACT now: Aggregate Comparison of Traces for Incident Localization. Retrieved from https://arxiv.org/abs/2205.06933.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000B-B54A-6 Version Permalink: https://hdl.handle.net/21.11116/0000-000B-B54B-5

Genre: Paper

Files

show Files

hide Files

:

arXiv:2205.06933.pdf (Preprint), 835KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000B-B54C-4

Name:
arXiv:2205.06933.pdf

Description:
File downloaded from arXiv at 2022-12-05 15:23

OA-Status:
Green

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://creativecommons.org/licenses/by-nc-sa/4.0/

Locators

show

Creators

show

hide

Creators:
Ramasubramanian, Kamala¹, Author
Raina, Ashutosh¹, Author
Mace, Jonathan², Author
Alvaro, Peter¹, Author

Affiliations:
1External Organizations, ou_persistent22
2Group J. Mace, Max Planck Institute for Software Systems, Max Planck Society, ou_3031907

Content

show

hide

Free keywords: Computer Science, Distributed, Parallel, and Cluster Computing, cs.DC

Abstract: Incidents in production systems are common and downtime is expensive.
Applying an appropriate mitigating action quickly, such as changing a specific
firewall rule, reverting a change, or diverting traffic to a different
availability zone, saves money. Incident localization is time-consuming since a
single failure can have many effects, extending far from the site of failure.
Knowing how different system events relate to each other is necessary to
quickly identify \emph{where} to mitigate. Our approach, Aggregate Comparison
of Traces (ACT), localizes incidents by comparing sets of traces (which capture
events and their relationships for individual requests) sampled from the most
recent steady-state operation and during an incident. In our quantitative
experiments, we show that ACT is able to effectively localize more than 99% of
incidents.

Details

show

hide

Language(s): enq - Enga

Dates: Created: 2022-05-13Published Online: 2022

Publication Status: Published online

Pages: 14 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 2205.06933
URI: https://arxiv.org/abs/2205.06933
BibTex Citekey: Ramasubramanian2205.06933

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show