CaltechAUTHORS
  A Caltech Library Service

Clustering Lines in High-Dimensional Space: Classification of Incomplete Data

Gao, Jie and Langberg, Michael and Schulman, Leonard J. (2010) Clustering Lines in High-Dimensional Space: Classification of Incomplete Data. ACM Transactions on Algorithms, 7 (1). Art. No. 8. ISSN 1549-6325 . http://resolver.caltech.edu/CaltechAUTHORS:20110421-134437164

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20110421-134437164

Abstract

A set of k balls B_1,...,B_k in a Euclidean space is said to cover a collection of lines if every line intersects some ball. We consider the k-center problem for lines in high-dimensional space: Given a set of n lines ^I= {I_1,...,l_n in R^d, find k balls of minimum radius which cover I. We present a 2-approximation algorithm for the cases k = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel “crosses” in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1145/1868237.1868246DOIUNSPECIFIED
http://portal.acm.org/citation.cfm?doid=1868237.1868246PublisherUNSPECIFIED
Additional Information:© 2010 ACM. Received Janauary 2008; Revised June 2009; Accepted December 2009. Publication date: November 2010. Part of this work was done when J. Gaowas a postdoctoral scholar with the Center for the Mathematics of Information, California Institute of Technology. Part of this work was done when M. Langberg was a postdoctoral scholar at the California Institute of Technology. The research of M. Langberg was supported in part by NSF grant CCF-0346991. The research of L. J. Schulman was supported in part by NSF CCF-0515342, NSA H98230-06-1-0074, and NSF ITR CCR-0326554. Michael Langberg would like to thank Dan Feldman for sharing ideas that led to the proof presented in Section B.1.
Funders:
Funding AgencyGrant Number
NSFCCF-0346991
NSFCCF-0515342
NSAH98230-06-1-0074
NSF Information Technology Research (ITR)CCR-0326554
Record Number:CaltechAUTHORS:20110421-134437164
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20110421-134437164
Official Citation:Clustering lines in high-dimensional space: Classification of incomplete data Jie Gao, Michael Langberg, Leonard J. Schulman Article No.: 8 doi>10.1145/1868237.1868246
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:23414
Collection:CaltechAUTHORS
Deposited By: Ruth Sustaita
Deposited On:21 Apr 2011 21:37
Last Modified:23 Aug 2016 10:01

Repository Staff Only: item control page