hide
Free keywords:
Computer Science, Distributed, Parallel, and Cluster Computing, cs.DC
Abstract:
Designing routing schemes is a multidimensional and complex task that depends
on the objective function, the computational model (centralized vs.
distributed), and the amount of uncertainty (online vs. offline). Nevertheless,
there are quite a few well-studied general techniques, for a large variety of
network problems. In contrast, in our view, practical techniques for designing
robust routing schemes are scarce; while fault-tolerance has been studied from
a number of angles, existing approaches are concerned with dealing with faults
after the fact by rerouting, self-healing, or similar techniques. We argue that
this comes at a high burden for the designer, as in such a system any algorithm
must account for the effects of faults on communication.
With the goal of initiating efforts towards addressing this issue, we
showcase simple and generic transformations that can be used as a blackbox to
increase resilience against (independently distributed) faults. Given a network
and a routing scheme, we determine a reinforced network and corresponding
routing scheme that faithfully preserves the specification and behavior of the
original scheme. We show that reasonably small constant overheads in terms of
size of the new network compared to the old are sufficient for substantially
relaxing the reliability requirements on individual components. The main
message in this paper is that the task of designing a robust routing scheme can
be decoupled into (i) designing a routing scheme that meets the specification
in a fault-free environment, (ii) ensuring that nodes correspond to
fault-containment regions, i.e., fail (approximately) independently, and (iii)
applying our transformation to obtain a reinforced network and a robust routing
scheme that is fault-tolerant.