CaltechAUTHORS
A Caltech Library Service

Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

Gao, Jie and Langberg, Michael and Schulman, Leonard J. (2008) Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem. Discrete and Computational Geometry, 40 (4). pp. 537-560. ISSN 0179-5376. doi:10.1007/s00454-008-9107-5. https://resolver.caltech.edu/CaltechAUTHORS:GAOdcg08

 Preview
PDF - Submitted Version
See Usage Policy.

178kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:GAOdcg08

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in R^d , incomplete data objects correspond to affine subspaces (lines or Δ-flats).With this motivation we study the problem of finding the minimum intersection radius r(L) of a set of lines or Δ-flats L: the least r such that there is a ball of radius r intersecting every flat in L. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family L of Δ-dimensional convex sets in a Hilbert space, there exist Δ + 2 sets L' ⊆ L such that r(L) ≤ 2r(L'). Based upon this we present an algorithm that computes a (1+ε)-core set L' ⊆ L, |L'| = O(Δ^4/ε), such that the ball centered at a point c with radius (1 +ε)r(L') intersects every element of L. The running time of the algorithm is O(n^(Δ+1)dpoly(Δ/ε)). For the case of lines or line segments (Δ = 1), the (expected) running time of the algorithm can be improved to O(ndpoly(1/ε)).We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1007/s00454-008-9107-5DOIArticle
ORCID:
AuthorORCID
Langberg, Michael0000-0002-7470-0718
Schulman, Leonard J.0000-0001-9901-2797
Additional Information:© 2008 Springer Science. Received: 12 April 2007/Revised: 12 June 2008/Accepted: 7 July 2008/Published online: 23 September 2008. Work was done when J. Gao was with Center for the Mathematics of Information, California Institute of Technology. Work was done when M. Langberg was a postdoctoral scholar at the California Institute of Technology. Research supported in part by NSF grant CCF-0346991. Research of L.J. Schulman supported in part by an NSF ITR and the Okawa Foundation.
Funders:
Funding AgencyGrant Number
NSFCCF-0346991
Okawa FoundationUNSPECIFIED
Subject Keywords:Clustering; k-center; Core set; Incomplete data; Helly theorem; Approximation; Inference.
Issue or Number:4
DOI:10.1007/s00454-008-9107-5
Record Number:CaltechAUTHORS:GAOdcg08
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:GAOdcg08
Official Citation:Gao, J., Langberg, M. & Schulman, L.J. Discrete Comput Geom (2008) 40: 537. https://doi.org/10.1007/s00454-008-9107-5
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:13555
Collection:CaltechAUTHORS
Deposited By: Ruth Sustaita
Deposited On:17 Jun 2009 19:03