English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
EndNote (UTF-8)
 
DownloadE-Mail
  Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Kerber, M., & Raghvendra, S. (2014). Approximation and Streaming Algorithms for Projective Clustering via Random Projections. Retrieved from http://arxiv.org/abs/1407.2063.

Item is

Files

hide Files
:
1407.2063.pdf (Preprint), 195KB
Name:
1407.2063.pdf
Description:
File downloaded from arXiv at 2014-12-03 09:02
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

hide
 Creators:
Kerber, Michael1, Author           
Raghvendra, Sharath2, Author
Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019              
2External Organizations, ou_persistent22              

Content

hide
Free keywords: Computer Science, Computational Geometry, cs.CG
 Abstract: Let $\epsilon>0$ be any constant and let $P$ be a set of $n$ points in $\mathbb{R}^d$. We design new streaming and approximation algorithms for clustering points of $P$. Consider the projective clustering problem: Given $k, q < n$, compute a set $F$ of $k$ $q$-flats such that the function $f_k^q(P,\rho)=\sum_{p\in P}d(p, F)^\rho$ is minimized; here $d(p, F)$ represents the distance of $p$ to the closest $q$-flat in $F$. For $\rho=\infty$, we interpret $f_k^q(P,\rho)$ to be $\max_{r\in P}d(r, F)$. When $\rho=1,2$ and $\infty$ and $q=0$, the problem corresponds to the well-known $k$-median, $k$-mean and the $k$-center clustering problems respectively. Our two main technical contributions are as follows: (i) Consider an orthogonal projection of $P$ to a randomly chosen $O(C_\rho(q,\epsilon)\log n/\epsilon^2)$-dimensional flat. For every subset $S \subseteq P$, we show that such a random projection will $\epsilon$-approximate $f_1^q(S,\rho)$. This result holds for any integer norm $\rho \ge 1$, including $\rho=\infty$; here $C_\rho(q,\epsilon)$ is the size of the smallest coreset that $\epsilon$-approximates $f_1^q(\cdot,\rho)$. For $\rho=1,2$ and $\infty$, $C_\rho(q,\epsilon)$ is known to be a constant which depends only on $q$ and $\epsilon$. (ii) We improve the size of the coreset when $\rho = \infty$. In particular, we improve the bounds of $C_\infty(q,\epsilon)$ to $O(q^3/\epsilon^2)$ from the previously-known $O(q^6/\epsilon^5 \log 1/\epsilon)$. As applications, we obtain better approximation and streaming algorithms for various projective clustering problems over high dimensional point sets. E.g., when $\rho =\infty$ and $q\geq 1$, we obtain a streaming algorithm that maintains an $\epsilon$-approximate solution using $O((d + n)q^3(\log n/\epsilon^4))$ space, which is better than the input size $O(nd)$.

Details

hide
Language(s): eng - English
 Dates: 2014-07-082014-07-08
 Publication Status: Published online
 Pages: 16 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 1407.2063
URI: http://arxiv.org/abs/1407.2063
BibTex Citekey: DBLP:journals/corr/KerberR14
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show