ausblenden:
Schlagwörter:
Computer Science, Distributed, Parallel, and Cluster Computing, cs.DC
Zusammenfassung:
With the rise of machine learning, inference on deep neural networks (DNNs)
has become a core building block on the critical path for many cloud
applications. Applications today rely on isolated ad-hoc deployments that force
users to compromise on consistent latency, elasticity, or cost-efficiency,
depending on workload characteristics. We propose to elevate DNN inference to
be a first class cloud primitive provided by a shared multi-tenant system, akin
to cloud storage, and cloud databases. A shared system enables cost-efficient
operation with consistent performance across the full spectrum of workloads. We
argue that DNN inference is an ideal candidate for a multi-tenant system
because of its narrow and well-defined interface and predictable resource
requirements.