CaltechAUTHORS
  A Caltech Library Service

The Effectiveness of Lloyd-Type Methods for the k-Means Problem

Ostrovsky, Rafail and Rabani, Yuval and Schulman, Leonard J. and Swamy, Chaitanya (2012) The Effectiveness of Lloyd-Type Methods for the k-Means Problem. Journal of the ACM, 59 (6). Art. No. 28. ISSN 0004-5411. doi:10.1145/2395116.2395117. https://resolver.caltech.edu/CaltechAUTHORS:20130125-143155195

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20130125-143155195

Abstract

We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1145/2395116.2395117DOIUNSPECIFIED
http://dl.acm.org/citation.cfm?id=2395117&CFID=264379016&CFTOKEN=81825997PublisherUNSPECIFIED
ORCID:
AuthorORCID
Schulman, Leonard J.0000-0001-9901-2797
Additional Information:© 2012 ACM. Received March 2010; revised April 2012 and September 2012; accepted September 2012. A preliminary version of this article appeared in Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2006. R. Ostrovsky was supported in part by NSF grants 0830803, 09165174, 1065276, 1118126, and 1136174; US-Israel BSF grant 2008411; OKAWA Foundation Research Award; IBM Faculty Research Award; Xerox Faculty Research Award; B. John Garrick Foundation Research Award, Teradata Research Award; and Lockheed-Martin Corporation Research Award; and the Defense Advanced Research Projects Agency through the U.S. Office of Naval Research under Contract N00014-11-1-0392. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Y. Rabani was supported by ISF grant 52/03 and BSF grant 2002282. L. J. Schulman was supported in part by NSF CCF-0515342, NSF CCF-1038578, NSA H98230-06-1-0074, and NSF ITR CCR-0326554. The research of C. Swamy was supported partially by NSERC grant 327620-09 and an Ontario Early Researcher Award. Part of this work was done while Y. Rabani was visiting UCLA and Caltech. We thank the referees for their feedback and various helpful suggestions.
Funders:
Funding AgencyGrant Number
NSF0830803
NSF09165174
NSF1065276
NSF1118126
NSF1136174
US-Israel BSF grant2008411
OKAWA Foundation Research AwardUNSPECIFIED
IBM Faculty Research AwardUNSPECIFIED
Xerox Faculty Research AwardUNSPECIFIED
B. John Garrick Foundation Research AwardUNSPECIFIED
Teradata Research AwardUNSPECIFIED
Lockheed-Martin Corporation Research AwardUNSPECIFIED
Office of Naval Research (ONR)N00014-11-1-0392
Israel Science Foundation (ISF)52/03
United States–Israel Binational Science Foundation (BSF)2002282
NSFCCF-0515342
NSFCCF-1038578
National Security Agency (NSA)H98230-06-1-0074
NSF-ITRCCR-0326554
NSERC327620-09
Ontario Early Researcher AwardUNSPECIFIED
Subject Keywords:Randomized algorithms, approximation algorithms
Series Name:Annual IEEE Symposium on Foundations of Computer Science
Issue or Number:6
DOI:10.1145/2395116.2395117
Record Number:CaltechAUTHORS:20130125-143155195
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20130125-143155195
Official Citation:The effectiveness of lloyd-type methods for the k-means problem Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy Article No.: 28 doi>10.1145/2395116.2395117
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:36604
Collection:CaltechAUTHORS
Deposited By: Ruth Sustaita
Deposited On:26 Jan 2013 00:02
Last Modified:09 Nov 2021 23:23

Repository Staff Only: item control page